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FOREWORD 


The  new  results  contained  in  this  monograph  were  obtained 
by  the  authors  in  the  joint  work  they  carried  out  in  Chicago, 
Illinois,  and  Stockholm,  Sweden,  during  the  years  1952- 
1953.  Our  thanks  are  due  to  the  Office  of  Naval  Research 
and  the  University  of  Chicago  for  their  generous  support 
of  our  collaboration.  The  efforts  of  H.  Cramer,  H.  Robbins, 
and  W.  A.  Wallis  were  most  helpful  in  carrying  on  the  work 
on  the  book  and  its  completion.  We  have  had  the  benefit 
of  the  advice  of  many  of  our  colleagues.  We  feel  especially 
indebted  fo  J.  L.  Doob,  F.  Mosteller,  W.  J.  Pierson,  Jr.,  and 
L.  J.  Tick  for  their  helpful  comments. 


PREFACE 


The  purpose  of  this  book  is  two-fold.  It  is  written  in  the  terminology  of 
the  theoretical  statistician  because  one  of  our  objectives  is  to  direct  his 
attention  to  an  approach  to  time  series  analysis  that  is  essentially  different 
from  most  of  the  techniques  used  by  time  series  analysts  in  the  past.  The 
second  objective  is  to  present  a  unified  treatment  of  methods  that  are 
being  used  increasingly  in  the  physical  sciences  and  technology.  We  hope 
that  the  book  will  be  of  considerable  interest  to  research  workers  in  these 
fields.  Keeping  the  first  objective  in  mind,  we  have  given  a  rigorous  mathe- 
matical discussion  of  these  new  topics  in  time  series  analysis.  The  existing 
literature  in  time  series  analysis  is  characterized  with  few  exceptions  by  a 
lack  of  precision  both  in  conception  and  in  the  mathematical  treatment  of 
the  problems  dealt  with.  To  avoid  this  vagueness  we  have  devoted  more 
space  to  rigorous  proofs  than  may  appear  necessary  to  some  readers,  but  we 
believe  that  a  study  of  the  proofs  will  furnish  valuable  clues  to  the  practical 
validity  of  the  results  and  be  an  important  guide  to  intuition.  We  have 
tried  to  balance  the  formal  proofs  with  intuitive  remarks  and  comments  on 
practical  applications.  While  the  regularity  assumptions  we  have  required 
in  many  cases  may  seem  restrictive,  appropriately  interpreted  they  give  an 
indication  of  the  range  in  which  the  methods  are  practically  valid.  We  have 
made  such  interpretations  in  the  comments  accompanying  the  formal  proofs. 

The  reader  is  assumed  to  have  a  knowledge  of  statistics  and  basic  proba- 
bility theory  equivalent  to  that  contained  in  H.  Cramer,  Mathematical  Methods 
of  Statistics.  The  statistical  techniques  suggested  in  this  monograph  use 
concepts  and  relations  from  the  theory  of  stochastic  processes.  However, 
we  shall  develop  the  results  we  need  in  the  two  first  chapters.  The  reader 
who  is  not  familiar  with  the  mathematical  techniques  used  in  this  book 
may  find  it  easier  to  skip  some  of  the  more  difficult  proofs  during  a  first 
reading.  This  is  especially  true  with  respect  to  Chapters  2,  4,  6  and  7.  A 
deeper  understanding  of  the  results  (and  their  limitations)  will  be  obtained 
by  returning  to  these  proofs  later. 

The  early  attempts  to  formulate  a  theory  for  the  statistical  analysis  of 
time  series  made  use  of  a  rather  simple  model.  The  observed  series  yt  was 
considered  as  the  sum  of  a  component  m   with  no  stochastic  element  in  it, 
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and  a  disturbance  xt,  where  the  x'&  are  supposed  to  be  independent  and 
identically  distributed.  As  an  example  consider  the  case 


ra, 


=  2  ^4*  cos  (tXv  +cpv) 


v  =  l 


and  let  xt  be  normally  distributed  with  mean  m  and  variance  a2,  where 
the  constants  are  not  specified.  A  typical  problem  would  then  be  to 
estimate  one  or  several  of  these  parameters.  A  more  general  and  flexible 
approach  to  this  subject  was  based  upon  the  assumption,  made  explicitly 
or  not,  that  the  underlying  stochastic  process  was  one  of  the  so-called 
finite  parameter  schemes.  These  include  the  moving  average 

%t  =  «o  it  +  ai%t-i  +  •  •  •  +  aP  £t-p 
and  the  autoregressive  scheme,  defined  as  a  solution  of  the  difference  equation 

^o  xt+p   '   °ixt+p-i   '    ""    '  °p  Xt  =  g t. 

Here  {£j  is  a  sequence  of  identically  and  independently  distributed  sto- 
chastic variables  and  {av} ,  {&„}  are  constants.  Modified  processes  with 
a  nonrandom  trigonometric  or  polynomial  regression  mt  superimposed, 
yt  =xt  +mt,  were  also  considered.  The  nonnegative  integer  p  is  called  the 
order  of  the  scheme. 

These  schemes  have  been  important  in  the  development  of  methods  for 
the  statistical  analysis  of  time  series.  They  have  been  used  with  a  varying 
degree  of  success  to  describe  many  types  of  phenomena  encountered  in 
applications.  From  the  discussion  in  Chapter  1  it  will  be  apparent  that 
by  using  these  schemes,  it  is  possible  to  approximate  a  large  and  important 
class  of  stationary  processes,  viz.  the  so-called  linear  processes  (see  1.6). 
For  this  to  be  possible  p  must  take  large  rather  than  small  values  and  para- 
meters involved  in  the  scheme  must  be  adjusted  adequately. 

During  the  last  ten  years  a  good  deal  of  work  has  been  devoted  to  the 
construction  of  tests,  estimates  and  confidence  intervals  appropriate  for 
these  schemes.  We  have  described  a  few  of  the  more  important  of  these 
results  in  Chapter  3.  In  spite  of  the  ingenuity  and  great  theoretical  interest 
of  some  of  these  methods,  their  practical  applicability  seems  to  be  limited 
severely  by  the  assumption  that  the  process  is  a  low  (usually  zero,  first  or 
second)  order  finite  parameter  scheme.  After  surveying  a  good  deal  of  the 
applied  literature  devoted  to  statistical  analysis  of  time  series  met  with  in 
practice,  we  have  come  to  the  following  conclusion, 
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Only  in  a  few  special  cases  (some  of  which  will  be  discussed  later  on  in 
this  book)  does  it  seem  reasonable  to  assume  on  a  priori  grounds  that  the 
process  is  a  low  order  finite  parameter  scheme.  Referring  to  what  has  been 
said  above,  we  can  still  approximate  the  process  by  a  scheme  of  sufficiently 
high  order  and  we  can  then  use  one  of  the  methods  developed  to  test  the 
fit.  This  procedure  is  legitimate,  however,  only  if  we  take  into  consideration 
the  power  of  the  test;  usually  this  power  will  be  rather  small  for  moderate 
sample  sizes.  Hence,  when  we  lack  information  concerning  the  structure 
of  the  process,  we  will  have  to  develop  methods  more  generally  valid.  If 
this  is  not  possible,  we  should  hesitate  to  make  quantitative  statistical 
statements,  which  would  be  based  on  seemingly  objective  methods,  hiding 
perhaps  the  weak  points  in  the  argument  and  giving  the  research  worker 
an  illusory  feeling  of  security. 

At  first  it  may  seem  impossible  to  construct  methods  of  inference  valid 
for  the  large  classes  of  stationary  processes  we  have  in  mind.  Indeed, 
leaving  the  finite  parameter  schemes,  we  now  deal  with  classes  of  probability 
distributions  characterized  by  an  infinite  number  of  parameters.  From  the 
finite  sample  we  obtain  information  concerning  these  parameters.  This  is 
the  same  problem  that  is  encountered  in  the  study  of  nonparametric  hypo- 
theses,  although  in  the  present  context  we  will  have  to  be  prepared  to  tackle 
even  more  complex  analytical  difficulties. 

It  may  be  of  interest  to  mention  the  two  sources  of  ideas  that  we  have 
found  most  useful.  The  first  is  the  applied  literature,  especially  papers 
dealing  with  statistical  questions  in  the  natural  sciences  and  engineering. 
The  statistician  intending  to  do  research  work  in  this  field  will  benefit  by 
getting  in  touch  with  the  wealth  of  statistical  research  presented  in  the  main 
journals  in  these  fields.  Some  of  these  journals  are  listed  in  our  bibliography. 
Second,  some  knowledge  of  the  modern  theory  of  probability  is  indispens- 
able, particularly  the  theory  of  stochastic  processes.  A  complete  and  rigo- 
rous exposition  of  this  subject  is  Doob:  Stochastic  Processes.  This  can  be 
supplemented  with  Blanc-Lapierre  and  Fortet:  Theorie  des  Fonctions  Alea- 
toires,  where  some  of  the  emphasis  is  on  applications  to  physics. 

Results  have  only  occasionally  been  put  in  the  form  of  theorems.  This 
is  to  emphasize  that  they  should  not  be  considered  as  parts  of  a  rigid  system 
that  can  be  used  immediately.  In  the  practical  applications,  modifications 
and  extensions  will  usually  be  needed. 

The  nonparametric  approach  we  have  spoken  of  has  been  used  quite  re- 
cently in  various  fields  of  the  physical  sciences  and  technology  although 
in  a  somewhat  disguised  form.  On  closer  scrutiny,  one  can  see  that  some 
of  the  basic  problems  dealt  with  in  these  fields  are  concerned  with  estima- 
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tion  of  the  spectrum  of  time  series,  detection  of  signals,  and  other  statistical 
problems  of  the  type  discussed  in  this  book.  The  success  of  these  methods  in 
these  concrete  contexts  seems  to  be  due  to  the  fact  that  in  these  fields 
people  know  a  good  deal  about  the  structure  of  the  random  phenomena 
studied  and  so  have  been  able  to  devise  appropriate  and  relevant  techniques. 
This  can  be  contrasted  with  the  rather  mechanical  methods  of  time  series 
analysis  used  by  theoretical  statisticians  in  the  past.  The  power  of  these 
new  techniques  is  to  be  attributed  to  their  nonparametric  character.  We 
have  especially  profited  from  reading  the  many  stimulating  papers  in  the 
current  engineering  literature.  Many  such  papers  can  be  found  in  the  biblio- 
graphy and  we  strongly  advise  the  interested  reader  to  examine  some  of 
these.  They  are  especially  valuable  because  of  the  problems  they  pose. 

The  basic  probability  model  considered  in  this  monograph  is  that  of  a 
stochastic  process  (or  sequence  of  random  variables) 

yt  =  xt+mt,       Eyt  =  mt,      t  =  ••,  -1,  0, 1,  ... 

with  mean  value  mt  =  2  cv  $*  an(i  known  regression  vectors  <pt  =  (..., 

<p(t\  ...).  The  residuals  xt  are  assumed  to  be  a  stationary  stochastic  process, 
that  is,  a  process  whose  probability  distribution  is  invariant  under  time  shifts. 
This  means  that  xt  is  a  stable  random  mechanism.  In  particular,  it  then 
follows  that  the  covariance  sequence 

gov  {yt,  yT)  =  Ext  xT  =  rt,r  =  rt_T 

depends  only  on  the  time  difference  t  —  x.  Such  a  model  fits  data  arising 
over  moderate  lengths  of  time  in  studies  of  random  noise,  problems  in  tur- 
bulence and  oceanography.  The  model  is  also  used  in  small  scale  investiga- 
tions in  meteorology.  The  covariances  rn  are  Fourier-Stieltjes  coefficients 

n 

r„=*  j  ein*dF(X) 

—  71 

of  a  bounded  nondecreasing  function  F{X).  The  function  F(X)  is  called  the 
spectral  distribution  function  of  the  process  and  knowledge  of  the  spectrum 
is  equivalent  to  knowledge  of  the  covariance  sequence.  It  turns  out  to  be 
much  more  convenient  statistically  to  deal  with  the  spectrum  rather  than 
the  covariance  sequence. 

The  framework  of  the  problems  considered  is  as  follows.  A  time  series 
ylf  ...,  yN,  a  partial  realization  of  the  process  {y J ,  is  observed  and  we 
wish  to  draw  inferences  from  the  observations  about  the  structure  of  the 
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process  {yt} .  Problems  of  estimation  and  testing  with  respect  to  the  regres- 
sion coefficients  are  considered.  A  typical  example  would  be  that  of  a  linear 
regression.  Then  there  would  be  two  regression  vectors 

<pa)  =  (...,  1,1,...) 

(p     =(...,1,2,..  ,  t,  ...) 

corresponding  to  the  regression  coefficients  clf  c2  of  the  regression 
mt=c1  +  c2 1.  Problems  of  estimation  and  testing  with  respect  to  the  spectral 
distribution  function  and  spectral  density  (derivative  of  the  spectral  distri- 
bution function)  are  discussed.  Confidence  bands  for  the  spectral  distri- 
bution function  and  spectral  density  are  set  up.  It  turns  out  that  many 
of  the  results  have  an  asymptotic  nonparametric  character,  that  is,  many 
of  the  limit  theorems  (asymptotic  distribution  theory,  etc.)  obtained  do 
not  depend  on  the  spectrum.  The  approach  is  quite  different  from  most 
of  the  earlier  work  in  time  series  analysis  and  is  much  more  general  in  scope. 

In  Chapter  1  the  basic  probability  theory  required  is  introduced.  The 
concepts  of  stationarity  and  spectrum  are  discussed  and  illustrated  by 
examples  drawn  for  the  most  part  from  physical  fields. 

In  the  second  chapter  the  linear  problems  of  prediction,  interpolation 
and  filtering  are  discussed  under  the  assumption  that  the  spectrum  is  known. 
Usually  the  spectrum  is  not  known  unless  there  is  a  good  deal  of  prior  ex- 
perience in  dealing  with  problems  arising  in  the  same  experimental  context. 
Much  of  the  remainder  of  the  book  is  concerned  with  the  statistical  estima- 
tion of  the  spectrum  when  it  is  not  known. 

In  Chapter  3,  the  earlier  work  on  statistical  analysis  of  time  series  is  sur- 
veyed. The  earlier  work  is  especially  concerned  with  very  special  finite 
parameter  models.  The  new  techniques  proposed  differ  in  that  they  deal 
with  infinite  dimensional  models  that  cover  all  the  special  models  considered 
before  and  thus  provide  a  uniform  approach.  The  power  of  the  new  tech- 
niques lies  in  their  great  generality.  The  first  three  chapters  serve  as  an  intro- 
duction. The  remaining  chapters  deal  with  the  new  techniques  and  their 
application. 

Estimation  of  the  spectral  density  is  considered  in  Chapter  4.  Two  types 
of  estimates  are  discussed  in  some  detail.  The  first  family  of  estimates, 
called  spectrograph  estimates  in  the  book,  are  well  suited  for  computation 
on  a  digital  computer  while  the  second  class  of  estimates  are  the  natural 
ones  to  build  into  analogue  computers.  The  bias  and  asymptotic  variance 
of  these  estimates  are  considered.  It  turns  out  that  any  good  estimate  of 
the  spectral  density  is  biased.  The  mean  square  error  of  an  estimate  is  a 
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convenient  measure  of  how  good  the  estimate  is  and  it  is  discussed  in  detail 
in  the  case  of  some  special  estimates. 

The  chapter  on  applications,  Chapter  5,  considers  the  model  of  a  station- 
ary process  as  it  arises  in  several  applied  fields  where  it  has  been  found  use- 
ful. Aspects  of  the  study  of  random  noise,  turbulence  and  storm -generated 
ocean  waves  are  developed  with  this  in  mind. 

The  asymptotic  distribution  of  a  class  of  estimates  of  the  spectral  distri- 
bution function  is  developed  in  Chapter  6.  Confidence  bands  are  set  up  for 
the  spectral  distribution  function  and  one-  and  two-sample  tests  are  dis- 
cussed. These  results  have  an  asymptotic  nonparametric  character.  Remarks 
are  made  about  the  distribution  theory  of  estimates  of  the  spectral  density. 
Examples  of  spectral  analysis  of  artificially  generated  time  series  are  in- 
cluded in  this  chapter. 

Chapter  7  deals  with  regression  analysis.  Linear  unbiased  estimates  of 
the  regression  coefficients  are  discussed.  The  least  squares  (computed  under 
the  assumption  the  residuals  are  independent)  and  Markov  (minimum  vari- 
ance unbiased  estimate)  estimates  are  compared.  Conditions  under  which 
the  least  squares  estimate  is  as  good  as  the  Markov  estimate  asymptotically 
are  given.  These  conditions  are  satisfied,  for  example,  for  polynomial  or 
trigonometric  regression.  It  looks  as  if  these  asymptotic  results  on  estima- 
tion of  regression  coefficients  are  approximately  valid  for  moderate  and  per- 
haps even  small  samples. 

The  last  chapter  discusses  assorted  problems  on  the  maxima  and  zeros 
of  time  series  as  well  as  prediction  when  the  spectrum  is  not  known  but  is 
estimated  from  the  time  series. 

The  reader  will  notice  that  almost  all  the  examples  discussed  in  the  text 
are  chosen  from  the  physical  sciences.  This  is  so  simply  for  the  reason  that 
some  of  the  most  natural  and  successful  applications  of  stationary  stochastic 
processes  have  been  in  these  fields. 

Something  should  be  said  about  the  limitations  of  the  methods  presented 
in  this  monograph.  As  is  apparent,  we  have  studied  only  processes  with 
stationary  residuals.  It  is  well  known  that  equilibrium  conditions  are  simpler 
to  analyze  than  evolution,  and  the  methods  presented  probably  cannot  be 
extended  to  the  nonstationary  case  without  essential  changes.  Futhermore, 
we  have  dealt  only  with  discrete  time,  although  in  many  of  the  problems 
we  discuss,  this  is  highly  unnatural.  In  some  cases  the  results  can  be  extended 
to  the  case  of  a  continuous  time  parameter  (see  Grenander  [1]  for  a  general 
outline  of  how  this  can  be  done)  but  in  other  cases  (e.g.,  the  problems  studied 
in  Chapter  7)  unsolved  problems  arise,  some  of  them  of  considerable  analyti- 
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cal  interest.  These  questions  should  be  studied  further.  Very  little  atten- 
tion is  paid  to  vector  processes  although  they  arise  in  a  number  of  important 
applications.  Here,  too,  an  extension  seems  possible  and  desirable  (see 
Grenander  and  Rosenblatt  [6]  and  Rosenblatt  [1],  [2]). 

Finally,  only  large  sample  methods  are  considered.  Because  time  series 
analysis  deals  with  dependent  observations  (this  reduces  the  amount  of 
information  obtained)  and  with  probability  distributions  belonging  to  very 
wide  classes,  the  sample  size  at  which  the  asymptotic  results  start  giving 
useful  approximations  may  be  fairly  large.  It  is,  of  course,  important  to 
find  out  at  what  sample  size  such  results  give  realistic  approximations. 
This  question  deserves  closer  attention,  perhaps  via  numerical  methods. 

If  the  reader  is  disturbed  enough  by  these  limitations  to  extend  the 
methods  of  analysis,  then  this  monograph  will  have  served  one  of  its  main 
purposes:  to  stimulate  research  in  time  series  analysis  which  will  lead  to 
practically  useful  and  theoretically  sound  methods. 

Each  chapter  of  the  book  is  divided  into  numbered  sections.  Section  6.2 
refers  to  section  2  of  Chapter  6.  The  numbered  formulas  are  started  ad 
initium  at  the  beginning  of  each  section.  Formula  (2)  mentioned  in  the  text 
of  section  6.2  refers  to  formula  (2)  of  that  same  section.  Formula  (6.1.2) 
mentioned  in  the  text  of  section  6.2  refers  to  formula  (2)  of  section  6.1. 

Some  problems  have  been  given  in  the  book,  partly  with  the  object  of 
providing  the  reader  with  exercise  and  partly  with  the  object  of  leading 
the  reader  on  to  derive  results  that  supplement  and  extend  the  theory  given 
in  the  text. 
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CHAPTER  1 

STATIONARY  STOCHASTIC  PROCESSES 
AND  THEIR  REPRESENTATIONS 

1.0.  Introduction 

The  basic  results  on  stationary  stochastic  processes  are  derived  in  this 
chapter. 

1.1.  What  Is  a  Stochastic  Process  ? 

We  first  give  the  general  and  rather  abstract  definition  of  a  stochastic 
process.  The  concrete  meaning  and  importance  of  this  definition  will  be 
understood  by  considering  the  simple  examples  in  this  and  the  following 
sections.  A  stochastic  'process  xt(w)  is  an  indexed  set  of  stochastic  variables. 
We  do  not  necessarily  restrict  ourselves  to  real- valued  stochastic  variables, 
but  consider  also  complex  and  vector- valued  stochastic  variables.  Here  t 
is  an  element  of  the  index  set  T  which  is  not  yet  specified.  The  set  T  can  be 
the  real  line  or  the  set  of  natural  numbers  or  a  collection  of  points  in  a  higher 
dimensional  Euclidean  space.  It  could  also  be  the  collection  of  all  Borel 
sets  on  such  a  space  or  on  the  real  line  in  which  case  xt(w)  is  a  stochastic 
setfunction.  These  are  just  a  few  possible  choices  but  they  will  be  adequate 
for  our  purpose. 

For  every  value  of  t,  xt  (w)  is  a  function  on  a  measure  space  Q.  A  probability 
measure  P  is  defined  on  a  cr-algebra  of  subsets  in  Q..  The  process  xt  (w)  should 
be  measurable  with  respect  to  P  for  each  1 6  T.  (See  Halmos  [1].) 

If  w  is  fixed,  xt  (w)  is  a  function  of  t,  a  sample  function  or  realization  of 
the  process,  and  by  letting  w  take  all  values  in  Q,  we  get  a  collection  or 
ensemble  of  such  functions. 

In  a  given  situation  one  can  choose  D.  and  define  P  in  many  different  ways 
but  the  following  one  is  adequate  in  many  cases.  Choose  Q.  as  the  set  of  all 
functions  on  T.  The  set  of  values  of  the  process  is  denoted  by  A  and  this 
will  always  be  a  finite  dimensional  complex  or  real  vector  space  in  this  book. 
For  any  finite  set  of  ^-values  tlt  t2,  '",tn  we  consider  the  event 

{xt.(w):6A,;j  =  l,2,  ...,n}eQ,  (1) 

where  A,  are  Borel  sets  in  A.  The  probabilities  of  all  such  events  are  given 
This  defines  a  finitely  additive  measure  on  these  events.  If  the  probabilities 
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of  all  such  events  are  defined  in  a  consistent  manner,  it  has  been  shown  that 
this  measure  can  be  extended  to  a  completely  additive  probability  measure 
P  defined  on  the  Borel  field  generated  by  these  events.  One  can  then  attribute 
a  probability  to  sets  determined  by  values  of  xt  (w)  at  a  denumerable  number 
of  £- values.  For  a  detailed  statement  and  proof  see  Kolmogoroff  [1]. 

In  many  cases  one  is  interested  in  the  behavior  of  the  process  at  more 
than  a  denumerable  number  of  ^-points.  There  are  difficulties  that  arise  in 
considering  such  a  problem.  However,  they  can  be  overcome  in  most  of 
the  usual  contexts.  Before  proceeding  with  the  main  development,  we  shall 
discuss  a  few  specific  processes  and  see  how  these  difficulties  are  overcome. 
Imagine  that  we  could  observe  a  particle  in  linear  Brownian  movement. 
It  has  been  observed  that  such  a  motion  has  a  very  erratic  and  irregular 
character  and  it  has  seemed  appropriate  to  use  a  stochastic  model.  Denoting 
the  position  of  the  particle  at  time  t  by  xt  (w),  assume  that  the  particle 
starts  out  at  position  zero  at  t  =  0,  x0(w)  =  0.  The  changes  in  position  of 
the  particle  over  nonoverlapping  time  intervals  are  assumed  to  be  independ- 
ent. This  means  that  the  change  of  position  is  not  influenced  by  the  past. 
Furthermore  the  distribution  of  the  displacement  during  a  time  interval 
(a,  b)  is  supposed  to  be  normal  with  mean  zero  and  variance  o2(b  —  a).  It  is 
clear  that  these  properties  determine  completely  the  probabilities  of  sets  of 
the  type  (1)  so  that  the  procedure  of  Kolmogoroff  can  be  carried  out. 

However  we  might  be  interested  in  the  probability  that  the  particle  does 
not  reach  an  absorbing  barrier  placed  at  the  point  x  =  1  during  a  time  inter- 
val (0,  t),  i.e.,  the  probability  of 

{xt{w)<\     for  all*  in  (0,  t)}  <=  £1,  (2) 

This  set  does  not  belong  to  the  Borel  field  considered. 

What  probability  should  one  attribute  to  such  an  event?  If  we  instead 
consider  the  set 

{xt(w)  <  1     for  all  rational  t  in  (o,  r)}  <=  Q.  (3) 

it  belongs  to  the  Borel  field  and  hence  it  has  a  well-defined  probability. 
If  the  "absorbing  barrier"  probability  has  any  intuitive  meaning  it  should 
be  equal  to  the  probability  of  the  last  set.  Now,  if  we  had  been  able  to  assume 
from  some  a  priori  grounds  that  the  only  possible  paths  are  fully  determined 
by  their  values  at  any  denumerable  dense  set  of  points,  the  sets  (2)  and  (3) 
would  be  the  same  and  hence  there  would  be  no  difficulty" in  defining  prob- 
abilities of  sets  such  as  (2).  One  possibility  would  be  to  consider  continuous 
paths  only,  which  can  be  shown  to  be  the  natural  set  of  sample  functions 
in  the  case  of  Brownian  motion  (see  Doob  [2]). 
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Another  model  of  the  linear  Brownian  motion  is  the  following.  Take 

~             sinvt7i/2 
xt  (w)  =  z0  (iv)  t  +  2,  Zv  (w) " — t      0<t<l,  (4) 

v=\  V  71 

where  the  zv's  are  independent  stochastic  variables  normally  distributed 
with  mean  zero  and  standard  deviation  a.  A  theorem  of  Paley- Wiener  (see 
Paley- Wiener  [1])  shows  that  this  series  converges  uniformly  in  t  with 
probability  1.  Because  of  the  uniform  convergence  of  (4)  the  sample  functions 
are  almost  certainly  continuous.  As  a  linear  combination  of  normal  stochas- 
tic variables,  xt  (w)  is  a  normal  process  in  the  sense  that  all  the  finite  dimen- 
sional distribution  functions  are  multivariate  normal.  It  is  easily  verified 
that  E  xt  (w)  =  0  and 

_      .   ■.       .    ,        o,2     2  o-2  £  sin  vjts  sin  vnt       2      .     .     JX 
Exs  (w)  xt  (w)  =  o  t  H -„-  2, 2 =  G    min  V5'  0* 

71     v  =  i  V 

This  implies  that  all  the  finite  dimensional  distributions  are  the  same  as 
in  the  previous  representation.  Note  that  in  spite  of  this  it  is  not  the  same 
model  as  that  originally  considered.  The  space  of  sample  functions  has  again 
been  restricted  to  the  subset  of  continuous  sample  functions.  The  probability 
measure  induced  by  this  model  is  essentially  the  same  as  the  probability 
measure  induced  by  the  modification  of  the  original  model  considered  above. 
As  another  example  let  us  consider  a  very  simplified  model  of  a  single 
line  in  a  telephone  network.  We  are  interested  in  the  number  of  calls  made 
on  this  line  from  time  a  to  time  b  >  a.  The  calls  are  assumed  to  be  so  short 
that  they  can  be  considered  of  zero  duration.  Let  n  (b)  —  n  (a)  be  the  number 
of  calls  made  in  the  time  interval  (a,  b).  For  convenience  let  n(0)  =  0.  Then 
the  variation  of  n  (t)  over  nonoverlapping  time  intervals  is  independent  and 
n(b)  —  n(a)  is  Poisson  distributed  with  mean  X (b  —  a),  so  that  all  finite 
dimensional  distributions  are  determined.  This  process  is  commonly  called 
the  Poisson  process.  One  will  be  interested  in  the  probability  of  sets  like 

{n(t)  -n(a)  <  1;     a<t<b). 

Again,  the  intuitively  plausible  procedure  would  be  to  set  the  probability 
of  this  set  equal  to  that  of 

{n(t)  —  n(a)  <  1;     a  <t  <b,  t  rational} 

which  is  well-defined.  The  natural  set  of  sample  functions  determined  by 
their  values  on  a  denumerable  everywhere  dense  set  would  now  turn  out 
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to  be  the  set  of  nondecreasing  jump  functions  rather  than  the  set  of  con- 
tinuous functions. 

A  formal  difficulty  of  the  same  sort  as  the  one  just  discussed  is  the  de- 
finition of  the  integral  of  a  stochastic  process.  This  difficulty  does  not  arise 
if  the  sample  space  Q  can  be  chosen  so  that  xt  (w)  is  measurable  on  TxQ, 
with  respect  to  the  product  measure  dt  x  dP  where  dt  is  the  measure  on  T 
we  are  interested  in.  Then  xt(w)  is  a  measurable  function  of  t  for  almost  all 
w.  Assuming 

j  E  |  xt  (w)  |  d  t  <  oo 

T 

the  integral  J  xt(w)  dt  is  a  measurable  function  of  O,  i.e.,  a  stochastic  vari- 

T 

able.  This  is  just  Fubini's  theorem  (see  e.g.,  Halmos  [1]). 

In  the  following  we  shall  always  assume  that  Q  has  been  chosen  so  that 
these  questions  do  not  arise.  For  a  detailed  discussion  of  these  problems  we 
refer  to  Doob  [2]. 

1.2.  Continuity  in  the  Mean 

Let  us  consider  a  complex- valued  stochastic  process  xt.  We  shall  assume 
that  the  absolute  second  order  moment  i£|a^|2  <  oo  (we  shall  usually  leave 
out  the  argument  w).  Introduce 

E  xt  =  mt  \ 


E  (xs  -  ms)  (xt  -  mt)  =  rs>  t.  j 

In  this  chapter  the  mean  value  function  mt  is  supposed  to  be  identically 
zero.  rSit  is  called  the  covariance  function  and  in  a  sense  measures  the  de- 
pendence between  the  values  of  the  process  at  different  ^-points. 

A  covariance  function  is  Hermitian,  i.e.,  rs>t=fts,  and  is  nonnegative 
definite.  This  is  immediately  seen  by  considering  the  variance  of  any  linear 
combination, 

n  n 

E\  2cvxt  |2=    2    CvC»rt     t  >0. 

On  the  other  hand  to  each  Hermitian  and  nonnegative  matrix  rSt  t  (it  is  under- 
stood that  s,  t  can  be  discrete  or  continuous  parameters)  there  corresponds 
a  stochastic  process  with  rSt  t  as  its  covariance  function.  If  rSt  t  is  real  for  any  set 
of  ^-values  tl9  t2,  ...,  tn  the  matrix  {rt ,  t  \  v,  ju  =  1,  2,  ...,  n}  is  symmetric 

and  nonnegative  definite.  Hence  there  is  a  normal  distribution  with  this  as 
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the  covariance  matrix.  We  can  define  all  the  finite  dimensional  probability 
distributions  of  the  process.  Applying  Kolmogoroff's  method  (see  section 
1.1)  we  arrive  at  a  process  with  the  desired  covariance  function.  If  rSt  t  takes 
complex  values  the  same  construction  is  possible,  although  the  process  must 
be  allowed  to  take  complex  values  and  the  joint  distribution  of  real  and 
imaginary  parts  must  be  specified. 

A  stochastic  process  is  said  to  be  continuous  in  the  mean  (continuous  i.m.) 
at  t  =  t0  if  E  \x(t)  —  x(t0)  |2->0  when  t->t0  in  the  topology  of  interest.  If  this 
is  true  for  all  t0  G  T  the  process  is  called  continuous  in  the  mean  on  T.  Such 
questions  do  not  arise  when  the  process  has  a  discrete  ^-parameter. 

The  identity 


rs,  t  -  rs>,  v  =  rs,t-  rs>,  %  +  rs.%  t  -  rs.t  V  =  E  (xs  -xs.)xt  +  E xs.  (xt  - xv) 

holds.  But  this  is  less  than  the  following  expression  in  absolute  value  ac- 
cording to  Schwarz'  inequality 

)/E\xt\*E\xs-xs.\*  +  )/E\xs.\*E\xt-xv\*. 

As  ]' E  |  xs>  |2  <  YE  I  xs  |2  +  VE  |  xs  —  xs>  |2,  we  see  that  rs>t  f  tends  to  rSt  t  when 
(s',  t')  tends  to  (s,  t). 

On  the  other  hand,  if  the  covariance  function  is  continuous  at  all  points 
of  the  diagonal  s  —  t,  then 

E\xs  -  xt\2  =  rs<s  +  rtt-  2  Be  rs'ri, 

and  as  s^^  we  have  rStS->rti  t  and  rsA—>rLt  which  is  real  and  the  continuity 
i.m.  follows.  Hence  continuity  i.m.  is  equivalent  to  the  continuity  of  rS)t  at  the 
points  of  the  diagonal  s  =  t. 

Define  ||a:||  =  l/JE,|a;  —  Ex\2.  Note  that  continuity  i.m.  is  not  simply 
related  to  the  continuity  of  almost  all  sample  functions.  The  Poisson  process 
is  discontinuous  with  positive  probability  over  any  finite  interval  but  it  is 
continuous  i.m.  as  is  easily  verified. 

1.3.  Stochastic  Set  Functions  of  Orthogonal  Increments 

Let  A  be  a  finite  dimensional  Euclidean  space  of  points  X.  On  the  Borel 
field  B A  there  is  defined  a  cr-finite  measure  ju.  We  now  take  T  =  BA  and 
consider  a  stochastic  process  z(s)  =  z(s,  w),  sEBA,  such  that 


z(s1  U  s2)  =  z(sx)  +  z(s2)  if  s±  f|  s2  =  0 
Ez{s)=0 


' 


E  z(s)z(s')  =ju{s  f|  s) 


1.3 

The  indication  of  the  dependence  of  z  (s,  w)  =  z(s)  on  w€Q  is  dropped  for 
convenience.  In  particular  if  s  and  s'  are  disjoint  the  two  corresponding 
stochastic  variables  are  uncorrelated  or  orthogonal.  Because  of  this  we  say 
that  z  (s)  is  a  stochastic  set  function  of  orthogonal  increments  or  an  orthogonal 
set  function. 

We  might  add  that  if  we  ask  for  independence  instead  of  orthogonality 
in  this  definition  we  get  a  so-called  differential  stochastic  set  function.  The 
Wiener  process  and  the  Poisson  process  generate  stochastic  set  functions 
that  are  easily  seen  to  be  not  only  orthogonal  but  also  differential. 

We  are  now  going  to  see  what  meaning  we  can  attach  to  the  expression 

J  f(k)  dz(X),  where  f(X)  is  a  complex- valued  function. 

A 

Let  I  be  an  interval  in  A,  i.e.,  a  bounded  rectangular  parallelepiped  and 
C/(A)  the  corresponding  characteristic  function.  We  interpret  the  integral 
of  c7(A)  with  respect  to  dz{X)  as  z(I).  Similarly  we  put 


/  2  <*„  civ  W  dz(X)=  2  ^  z  (Iv) 


v=l 


v  =  l 


Assume  that  J  \f  {X)  |2  dpi  (A)  <  oo.    Then  there  is  a  sequence  of  functions 


A 


/„  (A)   such  that 


j\fW-UW\2d}*-+0     as 

A 


n— >oo 


where    each    fn  (A)    is   of  the  form  2  a„  C/  (A)  with  the  Iv  disjoint.    But 
on  putting 

Jn=  jfnWdz{l)t 


A 


we  have 


E  |    'B  |2  =  S  |  0LV  |  V  (Iv)  =  S\fnW\2dfl  (A). 


A 


so  that 


E\Jn-Jm\2=   j\fnW-fm(X)\2d/Jia)->0.      as    U,  W->  oo 


A 


But  then  Jn   converges  in  the  mean  to  a  stochastic  variable  J, 
E\J\2=  lim  #|Jn|2=  j\fW\2dju(l)  (see  Doob[2]). 


A 
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We  shall  define   \f(X)dz{X)  as  J  and  it  is  easy  to  verify  that  J  does 

A 

not   depend  upon  the  particular  sequence  fn  (A)  by  which  we  have  ap- 
proximated /(A). 

This  integral  has  the  usual  properties  which  can  be  proved  easily 

/  [af  (A)  +  bg{X)]  dz(l)  =  af  f  (X)dz{X)  +  b  j  g  (X)dz(X)  (1) 

A  A 

lim  J/nU)d2(A)=  jfWdz'W  (2) 

n->oo  A  A 

if  and  only  if 

j\fW-fnW\2d/JL(X)->0      as    W^cx). 


E  f  f(l)d*(A)-f  g(X)dz(k)=  j  fa)g(k)dp(X).  (3) 


A  A  A 


1.4.  Orthogonal  Representations  of  Stochastic  Processes 

As  we  shall  see  many  important  types  of  processes  have  a  representation 
in  terms  of  integrals  with  respect  to  an  orthogonal  stochastic  set  function 
of  the  form 

xt=ff(t,k)dz(k),  (1) 

A 

where  of  course  /  (t,  A)  has  to  be  quadratically  integrable  as  a  function  of  A 
with  respect  to  ju  for  every  value  of  t.  It  follows  immediately  by  using 
property  (1.3.3)  of  the  integral  that 

rs>i  =  Exsxt  =j  f(s,  l)f(t,X)d/i(l).  (2) 

What  is  a  bit  more  difficult  to  prove,  but  will  be  very  useful,  is  that  from 
a  covariance  representation  of  the  form  (2)  one  can  deduce  the  representation  (1) 
of  the  process. 

To  prove  this  let  us  first  assume  that  the  system  of  functions  of  A,  f(t,  A), 
when  t  runs  through  all  values  of  T,  is  a  basis  for  L2(A)  =  {the  set  of  func- 
tions quadratically  integrable  with  respect  to  ju,} ,  i.e.,  every  function  99(A)  G 
L2(A)  can  be  approximated  in  the  mean  by  linear  combinations 

t~cvf(t„,A.),tveT. 

V 

If  s  is  a  set  in  A  of  finite  /^-measure,  then  its  characteristic  function  cs  (A)  G 
L2(A).  It  can  then  be  approximated  in  the  mean  by  sums 
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As 


where 


c$n>a)=2«$B)/#nu).. 


j\c™(X)-c™(l)\*dfia)  =  E\zn-z, 


A 


Zn=Z  a(vn)  Xt[n\ 


it  follows  that  the  sequence  zn  is  a  Cauchy  sequence  and  hence  converges 
to  a  stochastic  variable  which  we  will  denote  by  z(s).  If  s±  and  s2  are  disjoint 
we  have 


E\z(s1[)s2)-z(s1)-z(s2)\*=j\cSiU^)-csM)--csM2dvM 

A 

Also  we  see  easily  that 

^z(*i)zW  =  M<*i  n  s2) 

so  that  2(5)  is  an  orthogonal  stochastic  set  function.  Put 


0. 


yt=}f(t,X)dz(X). 

A 


(3) 


But 


Extz{s) 


lim  Ext  2  a£n)£t<n> 


lim  2 a<n)  //  (*,  A)  /  (C\  X) dfji  (X)=  jf  (t,  X)  dp  (X). 


n->oo 


A 


Using  an  approximation  procedure  again  we  get 

Extyt=f\f(t,X)\*df,(l.). 


Then 


E\xt-yt\*=0y 


so  that  xt  has  the  representation  (1). 

Now  we  drop  the  assumption  that  f(t,  X)  should  form  a  basis  in  L2(A). 
If  it  does  not  we  add  all  functions  orthogonal  to  every  f(t,X),t€.  T.  These  new 
functions  we  write  as  g(t,  X),  t£  T'  where  T'  has  no  points  in  common  with 
T.  Now  we  set 

if  (t,  X)  if  t  e  T 


h(t,X)  = 


\g(t,X)  if  teT'. 
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Let    yt  =  xt    when    t  G  T    and    set    yt    equal    to  the  normal  process  with 
covariance 


\g(s,  X)g(t,  X)dfi(k) 


A 


independent  of  xt  for  t€T'.    Then  yt  has  covariance 


J h  (s,  X)  h  (t,  A)  dp  (X),  s,t£T[]T'. 

A 

h  (t,  X)  is  a  basis  for  L2  (A)  when  t  runs  through  T  U  T'  so  that 

yt=  Jh(t,X)dz(X),t6T[)T' 

A 

where  z  is  an  orthogonal  stochastic  set  function.    But 
xt=yt=  jh{t,  X)dz(X)=  jf{t,X)dz(X) 

A  A 

when  t£T  so  that  we  have  the  desired  representation. 

This  result  gives  a  representation  of  the  process  as  the  limit  of  certain 
linear  combinations  of  the  functions  of  tf(t,  X)  with  weights  dz(X)  which 
are  stochastic  and  uncorrelated.  This  result  is  due  to  Karhunen  [1]  and  the 
proof  is  essentially  that  given  by  Cramer  [4].  One  should  note  that  the  inte- 
gral (1)  should  not  be  interpreted  sample  function-wise,  as  in  many  cases 
z(s)  does  not  form  a  completely  additive  set  function  with  probability  one. 

1.5.  Stationary  Processes 

So  far  we  have  imposed  very  few  essential  restrictions  upon  the  processes 
studied.  Our  main  interest  in  this  book  is  the  class  of  stationary  processes. 
They  arise  when  the  random  mechanism  producing  the  process  does  not 
change  with  t.  This  situation  is  often  met  with  in  technology  and  in  the 
physical  sciences.  In  other  fields,  such  as  economics  e.g.,  it  is  often  assumed 
to  hold  if  T  is  not  too  large  and  if  the  systematic  component  is  isolated  in 
an  appropriate  way. 

Let  xt  be  an  w-vector  valued  process  and  t  a  ^-vector.  Then  we  say  that 
xt  is  stationary  in  the  strict  sense  if  the  stochastic  variables 

xtx>  xt2>  •••»  xtm 

have  the  same  joint  probability  distribution  as  the  variables 

xtL+h>  xtt+h>  •••>  xtm+h 
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for  all  ra,  tls  t2,  ...,tm  and  every  ^-vector  h.  This  means  that  the  simultaneous 
distributions  depend  only  upon  time  parameter  differences  tx  — 12,  tt  — 13 
and  so  on. 

Let  T  be  the  set  of  integers  and  consider  the  complex-valued  process  de- 
fined by 

xt=  2*e,("*+*>,     t=.-l,0,  1,  ..., 

where  the  cv  are  real  constants,  the  Xv  are  real  constants  in  the  interval  (  —  n, 
n)  and  the  <pv  are  independent  stochastic  variables  uniformly  distributed  in 
the  interval  (  —  st,  n).  Consider  the  joint  distribution  of  xti+hi  xti+h,  ..., 
xt  +h.  They  can  be  written  as 


*tj+h=  2c,ei(JiVV      j=l,2,...,m, 


v=l 


where  %pv  =  <pv  +  h  Xv  and  hence  the  y)v  also  are  uniformly  distributed  over  an 
interval  of  length  2  n.  This  implies  that  the  variables  xt.+h  have  the  same 

joint  distribution  as  xt.  so  that  the  process  is  strictly  stationary.  This  is 

called  the  model  of  random  phases.  It  has  an  important  application  in  the 
theory  of  noise,  where  the  noise  current  at  time  t  consists  of  superimposed 
alternating  current  components  of  frequency  A„/2  n  cycles  per  second  with 
constant  amplitudes  cv  and  phases  <pv  which  are  supposed  to  be  independent 
with  rectangular  distributions.  In  this  application  one  would  naturally  use 
a  continuous  time  parameter  instead. 
The  mean  value  function  is  clearly 

Ext=  2  ^-cv  few*+*>d<p9  =  0 

and  the  covariance  function  because  of  the  independence  of  the  <pv 

Exsxt=  2  tir-  \Jis-t)K&yv=  2  cUHs-tn* 
=  jeH,-tiXdF{X). 

-71 

Here  F  (A)  is  the  nondecreasing  step  function 


F{X)=   2d 
30 
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Consider  the  process  yt  =  Rext.  This  is  the  process  with  physical 
meaning.    Then  the  mean  value 

n 

Eyt=  ^cvE  cos  (£A„  +  99„)  =  0 
and  the  covariance 

n  n  j 

Eysyt=lL<?v  f cos  (s  K  +  <pv)  cos  (t  lv  +  <pv)  -^ 

v=l         '  L71 

1     n  1      r 

=  oIcScos(«-^  =  -    J  cos  ($-*)  ldF{X)  (1) 

71  1 

=   fcos  (s-t)XdG{X),  0(X)  =  -   2    <£ 

o  Z  K<* 

Let  6r(A),  0  <  A  <tt,  be  any  nondecreasing  bounded  function.  We  can  then 
find  a  sequence  of  bounded  nondecreasing  step  functions  Gin)  {X)  with  all 
the  mass  concentrated  at  the  points  k  jz/n,  k  =  0,  ...,  n  —  1,  such  that  the 
total  variation  of  6r(n)(A)  over  every  interval  approaches  that  of  G(X)  over 
the  interval  and  the  mass  of  Gin)  (X)  at  each  of  the  points  k  n/n  tends  to  zero 
as  n^oo.  Consider  a  sequence  of  processes  y\n)  of  random  phases  corre- 
sponding to  G{n)  (A)  respectively.  It  is  clear  that  the  finite  dimensional  distri- 
butions of  y\n)  converge  to  the  corresponding  finite  dimensional  distributions 
of  the  normal  process  with  covariance  (1),  where  G(X)  is  the  arbitrary 
nondecreasing  function  spoken  of,  by  a  simple  application  of  the  central 
limit  theorem.  The  representation  of  the  covariance  function  obtained  in 
this  case  will  later  be  shown  to  be  valid  for  all  stationary  processes. 

For  an  example  of  a  stationary  process  depending  upon  a  continuous  t- 
parameter  we  consider  the  following  situation.  Let  a  mechanical  system  of 

n  degrees  of  freedom  be  described  by  the  generalized  coordinates  qv  q2,  •  ••, 
qn  and  the  corresponding  generalized  momenta  px,  p2,  . ..,  pn.  Assuming  the 

system  to  be  conservative,  the  motion  is  determined  by  the  system  of  differ- 
ential equations 


1,  2,  ...  ,  n, 


dqi 

dH 

dt 

dPi 

dpi 

-dH 

dt         dqi 

where  H  is  the  Hamiltonian  and  is  a  function  of  qlt  q2,  ...  plt  p2,  ...  pn  and 
t  is  time.  H  is  supposed  to  have  as  many  derivatives  as  are  required,  so  that 
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the  system  of  differential  equations  has  a  unique  solution  for  prescribed  ini- 
tial values  of  pt  and  qt. 

The  27i-space  of  points  P  =  (xlf  x2,  .,  x2n),  where  xx  =  qlt  ...,  xn  =  qn, 
xn+i=Pi>  •••>  xin=Vn->  *s  called  the  phase  space  of  the  system.  A  system 
characterized  by  a  point  Ps  in  phase  space  at  t  =  s  is  carried  into  the  state 
corresponding  to  Ps+t  after  t  seconds.  This  evidently  defines  a  one -parameter 
group  of  transformations  Tt  of  the  phase  space  onto  itself.  We  are  interested 
in  the  behavior  of  the  volume  element  under  these  transformations. 

Let  S  be  a  set  of  finite  Lebesgue  measure  in  phase  space.  Under  the  trans- 
formation Tt  the  set  is  transformed  into  a  set  St=  TtS  with  the  volume 


j  dx±  ...  dx2n  =  J  Jdy1  ...  dy2n, 


st 


where    Tt(yi ...  y2n)  —  (#1 ...  #2n)    and    J    is  the  corresponding  Jacobian, 

_  8  {X1  ...  X2n)  > 

8(yir..y2n) 

o    j  2  n 

But  ^24 

where  Jfc  = — 

d(yi'~-ym) 

and  where  x  =  dxk/dt.    Hence 

/   =   V  fL?l  dfe  •••  ^fc-i,  ay,  xjc+u  ...  x2n)  =  dx_ 
v=i  8xv  d(yi~-  Vzn)  dxk 

8J       *     d2H       T*     a2# 

and  — —  =  J  > J  > —  =  0 

8t         k=i8pk8qk         k=i8qkdpk 

so  that  J  does  not  depend  upon  t,  J  =  1.  The  Lebesgue  volume  is  then  left 
unchanged  by  Tt.  This  is  Liouville's  theorem. 

If  we  restrict  ourselves  to  a  subset  of  finite  volume  of  the  phase  space 
which  is  time  invariant  (sometimes  this  is  done  by  choosing  a  region  be- 
tween two  surfaces  of  constant  energy),  we  can  get  a  probability  measure 
by  norming  by  the  total  volume  of  this  subset.  Starting  out  with  a  uniform 
a  priori  distribution  in  the  subset,  it  will  clearly  remain  unchanged  as  time 
proceeds.  If  we  are  interested  in  some  phase  function,  i.e.,  a  function 
of  plt  ...,qn,  and  consider  it  when  t  varies  it  is  evidently  a  strictly  stationary 
stochastic  process.  Similarly  we  get  strictly  stationary  vector  processes  by 
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considering  several  phase  functions  simultaneously.  In  dealing  with  situa- 
tions like  this,  attempts  were  made  to  prove  the  equality  of  time  averages 
and  space  averages,  and  as  a  result  certain  ergodic  theorems  were  proved. 
Such  ergodic  theorems  will  be  discussed  briefly  in  section  1.7. 

In  some  contexts  one  meets  processes  that  are  not  strictly  stationary  but 
have  the  weaker  property  that  the  mean  value  function  is  identically  con- 
stant and  the  covariance  function  rSi  t  depends  only  upon  the  difference  s  —  t, 
rs,t==rs-t-  We  call  such  processes  weakly  stationary  processes. 

As  a  simple  example  we  mention  the  sine  functions 

xt  =  sin  2  n  t  ol, 

where  t  is  a  positive  integer  and  a  is  a  rectangular  stochastic  variable  in 
the  interval  (0,  1).  Of  course 

Ext  =  0 

l 

E  xsxt  =  j  sin  2  n s a  sin  27ttoLdoL  =  \  Ss-t 
o 

which  shows  the  weak  stationarity  of  xt.  But  xt  is  not  strictly  stationary. 
Consider  the  process  yt  =  g(x-{-taL)  where  x  is  uniformly  distributed 
on  (0,  1),  a  has  distribution  function  F  (ol)  and  g  is  a  quadratically 
integrable  function  with  period  1 .  This  class  of  processes  is  weakly 
stationary.    Here 


rT  =  cov  (yt,yt+r)  =     j     j  g(u)g(u  +  T0L)  dudF  (a) 

a=  -oo    0 
1 

and  m  =  Ey  =  j  g(x)dx. 


These  processes  can  also  be  shown  to  be  strictly  stationary  just  as  the 
random  phase  model. 

1.6.    Representations  of  Stationary  Processes 

In  Sections  1.6-1.9  we  consider  complex-valued  weakly  stationary 
stochastic  processes  with  T  as  either  the  set  of  integers,  or  the  real 
line,  or  a  finite -dimensional  vector  space. 

We  know  that  {rv_/z}  forms  a  nonnegative  definite  matrix  when  v,  tu 
run  through  all  integers.    Introduce 

V 
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where  the  prime  in  the  summation  sign  means  that  the  term  corresponding 
to  v  =  0  is  r0(X  +  7z).    Let  (Ax,  A2)  be  an  interval  in  (  —  n,  n).    Then 

1  eJi("-i")A2_  g-iCv-j")^, 


■F*  (A2)  -  ^  (Ax)  =  — -^    2   r,-„ 


2^^  „>iU=1  —i(v  —  jbi) 


Noting  that  i^  (  —  tt)  =  0,  FN  (ji)  =  r0  we  see  that  the  FN  (A)  are 
bounded  nondecreasing  functions  in  ( —  n,  n) .  Then  we  can  extract  a 
subsequence  FN  (A)  converging  to  a  function  F  (A)  of  the  same  type  at 

all  points  of  continuity  of  F (A)  (see  Cramer  [3]).    But 


je-^w^M1"?;)'   |re|-iV- 


—n 


0,        \n\>Nv. 
For  any  fixed  n,  on  letting  v  tend  to  infinity  we  get 

rn  =  \einXdF{X). 

—  n 

This  is  called  the  spectral  representation  of  the  covariance  function  and  the 
theorem  is  due  to  Herglotz  [1].  It  is  easily  seen  that  F{X)  is  unique. 

If  xt  is  real- valued  rt  =  r_t  so  that  the  distribution  of  the  spectrum  is 
symmetric  around  A  =  0.  Then  we  can  write 

rt  =  2JcostkdF{X). 

o 

When  T  is  the  real  line  and  xt  is  weakly  stationary  and  continuous  in 
the  mean  (which  is  equivalent  to  rt  being  continuous  at  t  =  0,  see  1.2)  it 
can  be  proved  that 

00 

rt=   j  eindF(X), 

-  00 

where  F(X)  is  again  a  bounded  nondecreasing  function.  Similarly  for  a  real 
process 

00 

rt  =  2  j  cos  tkdF(k). 

-o 
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For  a  proof  see  Bochner  [1].  One  may  note  that  these  relations  are  of  the 
same  form  as  that  between  the  distribution  function  and  the  characteristic 
function  of  a  stochastic  variable. 

In  both  the  discrete  and  continuous  parameter  cases  F(X)  is  called  the 
spectral  distribution  function  of  the  process  and  can  be  decomposed  into 
three  components 

where  Fa{X)  is  absolutely  continuous,  Fd(X)  a  step  function  and  FS{X)  a 
continuous  function  that  is  constant  almost  everywhere.  We  can  write 


Fa{X)=  j  f(X)dk 

-71 

Fd(X)=  2  M?(K) 


\<^ 


where  f{X)  =F'(X)  >  0  is  called  the  spectral  density  of  the  process  and  Xv 
are  the  discontinuity  points  of  F(X)  with  the  corresponding  saltuses 
AF(hv).  The  third  part  FS(A)  is  called  the  singular  part  of  the  decomposi- 
tion. In  the  case  one  usually  meets  in  practice,  /(A)  is  a  smooth  function, 
there  are  at  most  a  small  number  of  saltuses  and  the  singular  part  is  absent. 
If  T  is  7i-dimensional  Euclidean  space  or  the  set  of  all  integer  lattice 
points  in  it,  one  can  obtain  a  similar  representation  of  the  covariance  func- 
tion by  the  same  method.  One  finds  that 

A 

where  A  is  the  w-dimensional  Euclidean  space  in  the  first  case,  and  the  n- 
cube  with  sides  (  —  n,  n)  in  the  second  case.  F{X)  defines  a  bounded  measure 
in  both  cases  and  we  use  the  notation  x  •  y  to  denote  the  inner  product  of 
the  two  7i-vectors  x  and  y. 

As  we  have  the  representation  of  the  covariance  function 


71 


rs.t  =  rs_t=  j  eisXe-itkdF{X),s,  t= 1,  0,  1,  ... 


-71 


we  can  apply  the  representation  theorem  of  section  1.4  with  A  =  ( —  tt,  tz), 
f(t,  X)  =  el  and  dju=dF.  Then  there  is  an  orthogonal  set  function  z(s) 
defined  on  the  Borel  sets  of  the  interval  ( —  n,  n)  and  such  that 

71 

xt=  j  eindz{X),  *=•••-!,  0,  1,... 

-71 
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and  Ez{s1)z{s2)  =    j  dF{X) 

(Cramer  [1]). 

In  the  case  of  a  real  process  we  get  the  real  representation 

n  n 

xt  =  J  cos  tXdzx{X)  +  J  sin  tXdz2(X), 

0  0 

where  zx  {X)  and  z2  (A)  are  orthogonal  real-valued  stochastic  set  func- 
tions with 

Ez1(s1)z2(s2)  =  0  \ 

Ezl(s)  =  Ezl{s)  =  2JdF(k) 

s  I 

A  model  used  extensively  but  in  a  more  elaborate  form  in  econometrics 
is  the  following.  Let  xt  and  yt  be  the  price  and  supply  of  a  certain  commodity 
at  time  t  =  •  •  •  —  1 ,  0,  1,  . . . .  The  price  xt  and  supply  yt  are  assumed  to  be 
related  by  difference  equations 

xt  =  <x.-ftyt+r)'t       | 
yt=y+dxt_1+r]t     y 

where  a,  fl,  y,  d  are  real  constants  and  r\t  and  t]t  are  sequences  of  stochastic 
variables  representing  the  random  disturbances  that  the  economic  system 
is  exposed  to.  Solving  for  xt  we  get 

xt=0L-fiy-  f}dxt_x  +r)'t-  fir)" 

(see  Koopmans  [2]  and  Wold  [3]).  This  is  a  stochastic  difference  equation 
of  order  1 .  Let  us  consider  the  more  general  case  of  a  difference  equation  of 
order  p  with  constant  coefficients, 

ao Xt+p    '   ai Xt+p-l    *    ' ' '  ""   ^pXt  =  Tjt  (1) 

or  Lxt=Yju 

where  the  linear  operator  L  is  defined  by 

L  =  a0  rp  +  ax  tp_1  +  •  •  •  +  ap  t°. 

Here  t  stands  for  the  translation  operator.  We  shall  assume  that  the  rjt  have 
mean  value  zero,  variance  a2  and  are  uncorrelated.  A  process  xt  satisfying 
such  a  difference  equation  is  called  an  autoregressive  process. 
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We  are  interested  in  finding  the  stationary  (weakly)  stochastic  proc- 
esses satisfying  (1).  It  is  clear  that  r\t  is  weakly  stationary  so  that  it  can  be 
represented  as 

n 

I  I  -71 

As  Bti.ij,-J\t-f{J«-,ili^ 

Z  71 

-n 

this  process  has  an  absolutely  continuous  spectrum  with  the  constant 
spectral  density  /  (A)  =  a2/2  n.  Such  a  process  is  called  white  noise.  If 
xt  is  a  stationary  solution  of  (1)  then 

n 

xt=  j  eiUdzx(X) 


n 


Exsxt=  j  eHs~t)XdFx(X) 


n       p 


and  Lxt=  j    %a,eiip'->xeitxdzx(X)=  j  eindzn(X). 


-.n    "  =  ° 


V 


Put  <p(z)=  ^avzip-v\ 


v  =  0 


Approximating  the  function  which  is  1  for  —ji<X< /a,  and  zero  for 
H<\<n  by  the  complete  system  of  functions  eltx,  t  =  Q,  +1,  ±2,... 
and  using  property  (1.3.2)  for  integrals  with  respect  to  stochastic  set 
functions  (see  section  1.3),  we  get 

/  <p(en)(Lzx(X)=ztl{p). 

—  71 

From  this  it  follows  that 

j  \<p(en)\*dFxW  =  f-([i  +  n).  (2) 

L  71 

—  71 

The  left  hand  side  is  equal  to 

j  W(en)\2[fxWdX  +  dFx,d(X)]. 


-n 
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The  singular  part  must  be  absent  as  the  right  member  of  (2)  is  absolutely 
continuous  and  <p  is  analytic.  If  FX{X)  had  any  jumps  then  9?(elA)  would 
have  to  be  zero  at  these  jumps  since  the  right  hand  side  of  (2)  is  continuous. 
Thus 


-  n 


2tc 


and  fx(?i)  =  (G2/27i)(l/\(p{eu)\2).  But  from  this  it  follows  that  y{eiX) 
can  have  no  zeroes  as  fx  (X)  would  otherwise  not  be  integrable.  Hence 
we  have  proved  that 

*,=  je^—^dz.W  (3) 

is  the  only  stationary  solution  of  the  difference  equation  and  that  it  has  an 
absolutely  continuous  spectrum  with  the  spectral  density  fx(h)  given  above. 
Note  that  cp(exX)  has  no  real  zeros. 

Call  the  zeroes  of  cp(z)  in  the  complex  plane  zlt  z2,  •••,2P,  where  the 
zeroes  are  enumerated  so  that  zl5  22s  ••■,  %k  are  inside  and  zK+i,  •••,  zp  are 
outside  the  unit  circle  |  z  |  =  1 .  To  avoid  unnecessary  notation  we  shall 
assume  all  zeroes  to  be  simple.  Then 

1  p        A  K    A     ™     l<y\'1  p        A      °°     /  ?\tl 

_J_  =  2  ^-=  2  —  1  (-    -   I   —  I  (-) 

(p{Z)         v=\  Z~ZV        v  =  i    Z    [X  =  q   \z  /  v=K+l    zv    ^-0    \zvj 

which  is  uniformly  convergent  for  \z\  =  1.  Introducing  this  into  (3)  we  get 


K  co 


-fi-i 

V  =  l  H  =  Q  V=K\rl  yU  =  0 


xt=  2  Av  2  fit-n-i*!-  2  Av  2  yt+t& 


so  that  we  have  represented  the  process  as  a  moving  average 

co 

%t=   2  Kyt-v 


If  all  the  roots  are  inside  \z  \  =  1,  this  sum  clearly  extends  over  only  positive 
values  of  v,  so  that  only  the  past  values  tjtJ  t]t-i,  ■••  are  involved. 

The  covariances  can  be  given  a  simple  form  in  terms  of  the  bv  because 

00 

rsA  =  Exsxt  =  a2'   2  bs_vbt_v 

V=  —  00 

or  in  matrix  form 
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R  =  {rs_t;  s,t  =  •  •  •  -  1,  0,  1,  . . .}  =  a2  B  B*, 

where  B  =  {&„_/,  v,  fi  =  •••  —  1,  0,  1,  ...}. 

Here  B*  is  the  conjugate  transpose  of  B. 
It  is  also  easily  seen  that  B  =  A~x, 

where  A  =  {a^^     if  0  <  i>  —  ju<  p,  =  0  otherwise}  . 

The  extension  of  the  above  result  to  a  continuous  time  parameter  is 
fairly  obvious.  Consider,  e.g.,  a  one-dimensional  harmonic  oscillator  subject 
to  friction  and  exposed  to  random  impulses.  The  motion  is  governed  by  the 
Langevin  equation 

d2xt         dxt 

Here  m  >  0  is  the  mass  of  the  particle,  /?  >  0  the  friction  coefficient  and  oc  >  0 
a  constant  measuring  the  elastic  force.  The  r\t  are  the  random  shocks  and 
are  supposed  to  form  a  weakly  stationary  process  with  mean  zero.  As  before 

oo 

rjt  =   j  eitXdzn{X) 

—  oo 

CO 

Eyxrjt=  j  eindFn{X) 


00 


and  it  is  supposed  that  rjs  and  r\t  are  uncorrelated  unless  s  —  t  is  very  small 
compared  to  the  period  of  the  undisturbed  particle.  This  means  that  the 
^-spectrum  is  uniform  =  fn  over  a  wide  range  of  frequencies.  Strictly 
speaking  fn  (A)  could  not  be  constant  for  all  frequencies  as  this  would  imply 
infinite  variance  and  would  not  have  any  physical  meaning.  Reasoning  in 
the  same  way  as  before  we  get 

°°  1 

xt=       ein .2-Ltf..,     dzn(X) 

-oo         -m/'  +  pu  +  a 

/.  w  -         u 


I  -m^  +  ^U  +  al2 

where  the  second  equality  is  an  approximate  equality  and  holds  in  the 
large  frequency  band  we  have  considered.  One  can  also  obtain  a  moving 
average  representation  for  the  process. 

We  shall  derive  a  general  result  concerning  the  representation  of  a  class 
of  weakly  stationary  stochastic  processes  as  moving  averages.  For  a  change 
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the  proofs  will  be  carried  out  for  continuous  time  but  the  result  and  deriva- 
tion are  almost  the  same  in  the  discrete  case.  First  let  us  note  that  if  we 
have  a  moving  average  with  respect  to  an  orthogonal  process  rj(s) 

OO 

xt  =   j  a{t-s)drj(s)  (4) 


oo 


with  ^h'K)-*7(*a)|a  =  ki-*2l>  /  \a{s)\2ds<oo,  (5) 

—  oo 

then  xt  has  an  absolutely  continuous  spectrum.    This  follows  from  (see 
section   1.3) 


oo 


Exsxt=  j  a(s-r)a{t-r)dr=  j  ei{s-t)X\y{X)  |2  dX. 


-  oo 


Here    we    have    used    Parseval's    relation    and    y{X)    is    defined    as  the 
Fourier- Plancherel  transform  of  a  (t) 

1  A 

y(A)  =  -p=  lim    \  e+nra(r)dr. 

V2  n  a^*>  JA 


The  spectral  density  is  |  y{X)  |2. 

On  the  other  hand  if  the  spectrum  is  absolutely  continuous  and  /(A)  the 
spectral  density  we  take  any  measurable  square  root  of  it,  i.e.,  a  function 
y{X)  satisfying 

lr(A)|2  =  /(A). 


As  y(X)  is  quadratically  integrable  we  can  define 

1  A 

a{r)  =  -=  lim    J  e~iXx y  (X)dX 
V2  n  a->oo  _  A 

and  we  get  as  before 

oo 

fs,  t =  J  a>  (s  —  t)  a  (t  —  t)  d  t. 

-  00 

Applying  the  representation  theorem  of  section  1.4  we  see  that  there  is  an 
orthogonal  stochastic  set  function  rj(s)  satisfying  (4)  and  (5). 

As  an  example  consider  the  output  current  xt  of  a  vacuum  tube.  Due  to 
the  atomic  nature  of  electricity  the  current  we  observe  at  time  t  is  the  result 
of  all  the  electrons  arriving  at  t  or  earlier  at  the  anode.  The  tube  and  circuit 
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may  be  characterized  by  a  function  g(t)  giving  the  contribution  to  the 
current  observed  at  time  t  due  to  the  arrival  of  an  electron  at  time  0.  g(t) 
should  be  assumed  integrable  together  with  its  square.  Assuming  the  arrivals 
to  happen  independently  of  each  other  and  with  an  intensity  ($  (the  proba- 
bility of  an  arrival  in  the  interval  (t,t  +  At)  is  ^  At),  the  numbers  (t)  of  arrivals 
during  the  time  (0,  t)  forms  a  Poisson  process  (see  section  1.1). 

If  the  effects  of  the  electrons  superimpose  in  a  linear  way,  the  current  at 
time  t  is 


tv<t 


where  tv  are  the  time  points  of  the  arrivals  of  the  electrons.  But  this  can  be 
written 

00 

*t=  /  g(t-r)dn(r)  (6) 

-  00 

(as  g(t)  would  be  zero  for  negative  values  of  t  the  integration  need  only  be 
taken  over  (  —  oo,  t)).  As  n(r)  is  a  homogeneous  stochastic  set  function  with 

fl»W-»W]^(«r'«i) 

E[n(sx)  -n{s2)-p{s1-s2)]2  =/5|«a-«1|, 

we  see  that  (6)  is  a  moving  average  representation  of  the  current.  The  only 
difference  from  (4)  is  that  here  the  mean  value  is  not  zero  but 


oo  oo 


Ext  =  0  j  g(t-T)dr  =  fi  jg(r)dr. 


oo  0 

oo 


The  current  then  consists  of  a  d.c.  component  m g{r)  dr  plus  a  shot  noise 

o 
current  with  an  absolutely  continuous  spectrum  and  a  spectral  density 


2n 


jeu*g(r)dr 


The  covariance 

min  (*,  t) 

rs,t  =  P     /     g(s-r)g(t-r)dr. 


-  00 


In  the  case  of  discrete  time  one  gets  instead  the  representation 

00 

xt  —      Z,     Q>t-v  ?v> 
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where  the  £'s  form  white  noise  and  2|a«'|2<00-  The  spectral  density  is 

oo  2 


/(A)  = 


A 

2ji 


I 


ave 


-ivX 


If  the  £'s  are  independent  and  identically  distributed  stochastic  variables 
we  refer  to  the  £  process  as  'pure  white,  noise.  We  call  xt  a  linear  process  if 
the  |  process  is  pure  white  noise. 


1.7.  Time  and  Ensemble  Averages 

So  far  we  have  described  the  stationary  processes  in  terms  of  means  and 
co variances.  They  are  the  averages  of  the  process  at  one  or  two  time  points. 
Average  here  means  averaging  over  all  possible  states,  that  is  over  Q  at 
those  times.  If  we  consider  the  process  as  an  ensemble  of  possible  functions 
of  time,  e.g.,  paths  in  phase  space,  one  could  call  the  averages  ensemble  (or 
phase)  averages. 

In  many  cases  this  is  not  an  operational  definition  as  we  have  often  only 
one  realization  of  the  process  at  our  disposal.  It  is  however  intuitively  plau- 
sible that  we  would  obtain  the  same  result  by  averaging  over  time  instead  as 
the  process  is  stationary  and  its  structure  does  not  change  with  time.  The 
mathematical  counterpart  of  this  is  the  ergodic  theorem  in  its  various  for- 
mulations. 

Suppose  that  we  want  to  measure  m  =  Ext  of  a  weakly  stationary  process 
with  the  spectral  representation 


xt 


-m=  j  eitxdz(X) 


rStt=    j  eHs-tndF(X) 

—  71 

s,t  =  ...,  -I,  0,  1, 


Consider  the  time  average 


n      n 


-I./V  l-^yv  1  1—  /? 

1    ?       l  —  etNX 

-  n 

where  A  2(0)  is  the  jump  (possible  zero)  of  z{X)  at  0,  and 
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0  U(A)-Az(0),  ^>0. 


From  section   1.3  it  follows  that 


J£ 


AT 


—  2  xv  ~  m  -  A  z  (0) 


(i) 


where  F0(X)  coincides  with  F (A)  except  that  a  possible  discontinuity 
at  X  =  0  has  been  removed  from  F{X).  But  this  expression  is  at  most 
equal  to 

^(j+JI^+^.w-^-')}- 

171       c'    sin2- 

Choosing  e  so  that  the  second  term  is  sufficiently  small  and  then  taking 
N  large  we  see  that  the  right  hand  side  of  (1)  tends  to  zero  as  iV^oo. 
Hence  the  time  average  converges  in  the  mean  to  m  plus  a  stochastic  variable 
of  variance  AF(0).  If  there  is  no  discrete  spectral  mass  at  X  —  0  the  time 
average  converges  to  the  ensemble  average.  This  is  called  the  statistical  ergodic 
theorem.  In  taking  a  time  average  of  the  process,  we  are  averaging  one  reali- 
zation of  the  process  and  if  Az(0)=£  0,  we  have  an  additional  term  which 
is  of  course  constant  for  this  realization.  One  cannot  obtain  information 
about  the  variance  of  the  random  term  Az  (0)  =£  0  without  considering  several 
realizations  of  the  process.  The  reader  may  note  that  this  is  a  simple  version 
of  what  is  called  the  weak  law  of  large  numbers  in  probability  theory. 
In  the  same  way  one  can  show  that 

£!*•""  (2) 

converges  in  the  mean  square  to  Az(ju).  Averages  of  the  form  (2)  isolate  the 

jumps  Az(A)  if  there  are  any. 

If  xt  is  strictly  stationary  and  .£7 1  a:^  |  <  oo  then  one  can  prove  a  stronger 

1    N 
statement.  The  time  average  —  2  xt  wnl  converge  to  a  random  variable  for 

N  v=1 

almost  all  realizations.  This  is  called  the  individual  ergodic  theorem  (for  a 
proof  see  Doob  [1]). 

Taking  &>  =  {xt,  t  =  •••  —  1,  0,  1,  ...},  and  O  as  the  space  of  all  such  sample 
sequences,  we  can  define  the  transformation 
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Tco={xt_1,  t  = 1,0,1,...}, 

i.e.,  T  is  the  translation  operator.  From  the  strict  stationarity  if  follows  im- 
mediately that  {Tn}  is  a  group  of  measure  preserving  transformations. 
For  each  function  L(w)  which  is  absolutely  integrable  on  Q.  with  respect 
to  P- measure,  L^oo)  is  a  strictly  stationary  process  and  the  time  average 
converges  to  a  stochastic  variable  L  (see  the  last  paragraph).  If  it  is  true 
that  L  reduces  to  a  constant  almost  certainly  for  each  L(oj)  then  xt  is  called 
an  ergodic  process. 

It  is  clear  from  the  above  that  if  the  spectrum  of  xt  has  a  jump  at  X  =  0  it 
cannot  be  ergodic.  On  the  other  hand  one  cannot  in  general  hope  to  reduce 
the  ergodic  property  to  properties  of  the  spectrum  as  the  later  determines 
only  the  second  order  moments.  Ergodicity  is  related  to  the  entire  proba- 
bility structure  of  the  process.  However  there  are  two  simple  cases  that  have 
been  studied. 

If  xt  is  a  normal  process  one  can  show  that  a  necessary  and  sufficient  condi- 
tion for  it  to  be  ergodic  is  that  the  spectrum  be  continuous.  If  xtis  a  linear  process 
it  has  been  shown  that  xt  is  an  ergodic  process.  It  should  be  mentioned  that  in 
the  case  of  a  continuous  time  parameter  the  definitions  and  proofs  of  this 
section  are  quite  analogous. 

Consider,  e.g.,  the  process  (1.6.6).  As  it  is  a  linear  process  it  follows  that 

,     T  oo 

lim  —\xtdt^p\g(r)dr  =  m 

T->oo   I   0  0 


,     T  oo 

lim  —  f  (xt  -  mf  dt  =  fi  f  g2{r)dr 
with  probability  one.  This  is  known  as  Campbell's  theorem  (see  Rice  [1]). 

1.8.  Vector  Processes 

Let  T  be  w-dimensional  Euclidean  space  and  xt  =  (xf\  xf\  ...,  x\m))  a 
weakly  stationary  process  taking  as  values  vectors  in  m-space.  Assuming  the 
constant  mean  value  vector  to  be  zero  we  introduce  the  covariance  matrix 

R(t)  =  {r}-,k(t)} 

=  {Ex(Jltx{sk)*;  j,  *=1,2,  ...,  m},     s,  t£T. 

As  before  R(  —  t) .  =  B(t)*;   *  denotes  the  conjugate  transpose.  For  any 
m- vector  a  =  (al5  a2,  ...,  am)  the  stochastic  process  a  •  xt  is  complex-valued 
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and  weakly  stationary.  We  can  now  apply  the  spectral  representation  theo- 
rem of  section  1.6.  We  then  have 

E(oL-x,)(aL'Xt)=  jeH8-tyXdFM)  (1) 

A 

where  the  spectral  distribution  function  Fa  (X)  depends  upon  a.  We  have 


m 


E  (a  •  xs)  (a  •  xt)  =   2  0Lvcf.(JirVt[l(s  —  t)  =  cLR{s  —  l)ix  . 

But  then  Fa(X)  =a#(A)a*,  where  F(X)  is  a  function  of  X  taking  as  values 
Hermitian  m  x  m  matrices.  Furthermore  the  increment  over  an  interval  is 

ocA.F(A)a*=Ai7a(A)>0  (2) 

so  that  AF(X)  is  a  nonnegative  definite  matrix.  Equating  coefficients  of 
the  two  quadratic  forms  in  (1)  we  get 

rjtk(s-t)=\eHs-t){dFjik(X) 

A 

where  Fjtk  (X)  is  the  element  corresponding  to  subscript  (j,  k)  of  the  matrix 
F(X). 
Taking  a;  =  1 ,  a„  =  0  for  v  ^  j,  we  get  from  (2) 

af„(1)>o. 

As  Fjj  (oo)  —  Fjj  (  —  oo)  =  Yjj  (0)  <  co  we  see  that  the  diagonal  elements  of 
F(X)  are  bounded  nondecreasing  functions. 

Setting  <Xj  =  a,  afc  =  b,  a,,  =  0  for  v^j,  k  and  using  (2)  we  find  that 


\a\*AFj}(X)+2ReabAFjk(X)  +  \b\*AFkk(X)>0 
for  any  a,  b  so  that 

AFJk(X)\<  VAFjj(X)AFh 


kk' 


As  Fjj(X)  and  Fkk(X)  are  of  bounded  variation,  a  simple  application  of 
Schwarz'  inequality  shows  that  all  the  elements  of  F(X)  are  of  bounded  va- 
riation. One  should  note  however  that  for  j^k  the  function  Fjk  {X)  is  not 
necessarily  nondecreasing.  Hence  we  have  proved  the  extended  spectral 
representation 

R(s-t)  =  jeHs-t)AdF(X).  (3) 

A 
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F j  j  (A)  is  clearly  the  spectral  distribution  function  of  the  jth  component  of 
xt.  Fjk  is  called  the  cross  spectral  distribution  function  of  the  jih  and  kth 
components.  This  result  is  due  to  Cramer  [1].  In  the  discrete  case  one  ob- 
tains the  analogous  representation. 

It  is  easily  shown  that  if  R(s  —  t)  is  given  in  the  form  (3)  it  can  be  consid- 
ered as  the  covariance  function  of  a  stationary  process. 

Considering  each  component  of  the  process  xt  we  find  immediately  that 
the  process  itself  can  be  represented  as 


xt 


jeu*dzW,  (4) 


A 


where  z(X)  is  a  vector  valued  stochastic  set-function  with  components 
zx{X),  z2(A),  ...,sn(A)  and 


Ezj(s1)zk(s2)^    \    dFjk{X). 


Sids2 

Consider  as  an  example  a  linear  passive  electric  network  consisting  of  m 
loops.  It  is  governed  by  the  system  of  differential  equations 

_  d2xt      -r^dxt      ~  . 

L  — -2-  +  R-—-+Gxt  =  t;t,     -  oo  <  t  <  oo ,  (5) 

where  the  constant  matrices  L,  R  and  G  are  determined  by  the  network. 
£t  is  an  m-vector  having  as  its  components  the  random  driving  voltages 
that  the  loops  of  the  systems  are  subject  to.  Similarly  the  components  of 
xt  represent  the  charges  and  we  are  interested  in  finding  any  stationary 
solution  of  (5)  where  the  gt  represents  noise. 

If  the  m  different  noise  sources  can  be  considered  as  different  and  essen- 
tially independent  we  have  no  cross  correlation,  i.e,  E^j(t)^k(t)  =0iij^  k,  in 
which  case  we  say  that  the  m  components  of  the  noise  are  incoherent.  Then 
clearly  F%  (A)  reduces  to  a  diagonal  matrix.  In  the  general  case,  however, 
the  cross  spectra  of  the  ^-process  may  not  vanish,  in  which  case  we  have 
coherent  noise. 

In  any  case  we  represent  £t  by 

00 

ff=  f  eitxdztW, 

—  oo 

oo 

and  E  #>  #*>  =  j  ei(s-l)*  ffk  (X)dk. 

-  00 
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We  have  assumed  that  spectra  of  the  m  components  are  absolutely  con- 
tinuous which  implies,  as  we  know,  that  the  cross  spectra  have  the  same 
property. 

If  xt  is  a  weakly  stationary  solution  of  (5)  which  then  can  be  represented 
as  (4),  we  should  have  (assuming  of  course  that  the  first  and  second  deriva- 
tives in  the  mean  exist) 

00  00 

[  eia[-X2L  +  iXR  +  G]dzx(X)  =  j  eiUdz^X),    -oo<z<oo. 


00 


This  implies  as  in  section  1.6  that 

MxdFx(X)Ml  =  dFeQ)  =  {fft(X),  j,  &=1,  2,  ...,  m}dX  =  f^(X)dX, 
where  MK  =  [-  X2L  +  iX  R  +  G]. 

Then  we  see  that  a  necessary  and  sufficient  condition  for  the  existence  of 
a  stationary  solution  is  that  Mx  be  nonsingular  for  all  real  X,  i.e.,  that  the 
undisturbed  system  have  no  undamped  eigen-oscillation.  If  this  condition 
is  satisfied,  the  weakly  stationary  process  defined  by 

oo 

xt=   j  ein Ml1  dz$(X) 

-  00 

with  the  spectral  density 

rW  =  {®W,  j,  4-1,  2,  ...,  m}  =  Mxli^(X)(MS)* 

is  the  stationary  solution  of  (5). 

In  certain  applications  the  covariance  structure  of  the  process  is  character- 
ized by  certain  symmetry  properties  in  addition  to  stationarity.  Let  us 
first  deal  with  a  complex-valued  process  of  a  vector  argument  t  in  Euclidean 
n-space  A  such  that  its  covariance  function  rt  is  invariant  under  rotations  of 
the  argument  space  around  the  origin. 

We  have 

rt=  feu*dF(X). 

A 

If  0  is  any  orthogonal  nxn  matrix  we  have 

rt  =  rot=  jeiot*dF(X)  =  j  eit0~h  dF  (X)  =  j  e1'^  dF  (O/lc), 


so  that  dF(X)  =  dF(OX).  This  shows  that  the  spectral  mass  is  sj^mmetric 
with  respect  to  rotations  around  the  origin. 
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Introduce  the  distribution  function 

G{u)=    j  dF{X),     0<u<oo, 

\h\<U 

and  K(u,t)=  \eitl-^-7 

J         s(u) 

where  the  integration  is  carried  out  over  the  surface  |A|  =  w,  ds  is  the 
area  element,  and  s (u)  the  total  area  of  the  surface  |A|  =  u 

n 

s  (u)  =  — 7-v  u 

r 


We  have  as  is  easily  verified 


rt=  J  K(u,t)dG(u). 
o 
But 


n  n 


j   c"-*dA=(-l)BJ-  JciTUC08''l(sin^1)n(sin^2)n-1 

|A|<u  0  0 

...  sin  (pnun  d(px  ...  dcpn.       (6) 

Since  the  expression  depends  only  upon  length  of  the  vector  t  we  have 
taken  t  =  (x,  0,  0,  ...  0).    Putting 

Xx  =  u  cos  (px 
X2=u  sin  (px  cos  <p2 

ln  =  u  sin  cpx  ...  sin  (pn-i  cos  <pn 
this  defines  a  one-one  correspondence  between  |  X  \  <  u  and 

0  <  (px  <  71,  ...,  0<(fn<7t 

with  the  Jacobian  appearing  in  the  second  integral  in  (6).    As 

we  get 
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n-l 
2~ 


f   eiiAdA  =  (-l)n  \eixucoS(pun      n  (sin  y)n  c* 

iai<«  o  rl^-^i 


<? 


n  n 

=  (  — 1)    I I    Jn(-ru)  =  i I    Jn{%U). 

Hence  by  differentiating  this  with  respect  to  i*  we  get  K  (u,  t)  s  (u)  so  that 

n 

n  n 

(2U7ZY    r  ,        s  fcuTlY      n       _     ,        .      } 

+  1 Jn-2    Ttt    T- -I ~  Jn  (TU)t} 

\      T-     /         ~2~  \      T     /     2T%       2  J 

n-2         n-2         n-2         /    \ 
=  2    2     M        2     T        2     rl-l  Jn-2(l«T). 

(See  Whittaker-Watson  [1],  p.  360).  Hence 


^  n-2 

*P~        ° 


V_         U       2     Jn-2(l*|*|)d£(l*), 


where  the  integral  converges  absolutely.  Conversely  any  such  representation 
defines  a  co variance  function.  This  result  is  due  to  Bochner  [1]. 

In  certain  applications  one  is  interested  not  in  invariance  with  respect 
to  the  whole  orthogonal  group  but  to  a  subgroup  of  it.  A  similar  but  slightly 
more  complicated  result  can  be  deduced  in  this  case. 


1.9.  Operations  on  Stationary  Processes 

1.9  A.  Linear  operations.  Let  J  be  a  linear  filter  through  which  an  input 
I(t)  is  passed.  We  can  characterize  *3-  by  its  frequency  response  function  y  (A) 
or  its  transient  response  function  g(t).  The  function  y{X)  indicates  that  the 
output  is  equal  to  y  (A)  el l  when  /  (t)  —  elt  .  Its  absolute  value  \y  (A)  |  measures 
the  amplification  of  /  at  frequency  A  and  its  argument  arg  y  (A)  is  the  phase 
shift  of  e%n  on  passing  through  the  filter.  The  function  g(t)  is  the  output  at 
time  t  when  I(t)  =  dt0.  It  is  physically  plausible  that  g(t)  =  0  for  t  <  0  and 
we  shall  assume  that  this  is  so.  If  g(t)  is  integrable 
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yW  =  fg(t)e~iUdt. 

0 

It  is  sometimes  convenient  to  use  filters  that  do  not  correspond  to  a  proper 
function.  One  may,  e.g.,  be  interested  in  the  ^-function  and  its  derivatives. 
If  the  input  is  a  weakly  stationary  process  xt  with  the  spectral  represen- 
tation 

00 

xt=    j  eiiXdz{X), 

—  00 

then  the  output  is  given  by 

00 

Vt=    j  eiayWdz(k), 


—  00 


so  that  dFv(A)  =  \y{X)\2dFx(A). 

In    order   that  these  equations  have  a  meaning  we  clearly  have  to  as- 
sume that 

00 

/  |rU)|2diMA)<oo, 

—  oo 

i.e.,  the  average  output  power  should  be  finite. 

The  situation  is  similar  for  vector  processes.  As  an  example  let  xt  be  a 
weakly  stationary  process  taking  ^-vectors  as  values  and  having  derivatives 
in  the  mean.  Suppose  that  the  filter  corresponds  to  the  linear  operator 

a  =  {Lij;  i,  /  =  !,  2,  ....,  n}, 

where  each  element  is  of  the  form 

m  pic 

■Liij  =   2,  aijk  XTfc  " 
fc  =  l  0l 


Then  xt=  jeitAdz(?i) 

A 

and  the  filtered  process 

yt=  \eu'xA(X)dz(X) 

A 

where  A  (A)  is  the  n  x  n  matrix  with  elements 
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m 


k  =  l 


Hence  yt  has  a  spectral  matrix 

A(X)dF(X)A*(X)\ 

compare  this  result  with  the  example  given  in  section  1.8. 

Another  simple  linear  operation  is  the  following.  Let  xt  =  (x™,  x(t2))  be 
a  weakly  stationary  two-dimensional  process  with  a  spectrum  characterised 
by  the  matrix  of  spectral  distribution  functions  F{X)  =  {F  jk(X);  j}  k  =  1,  2}. 
Putting  yt  =  x^  +  x(2)  it  is  immediately  seen  that  this  is  also  weakly  sta- 
tionary and  has  the  spectral  distribution  function 

Fy  (A)  =  Fxl(l)  +  2  ReF12W  +  Fn(X). 

If  the  components  are  incoherent 

FyW=F1M  +  F2M 

i.e.,  the  spectral  energies  just  add. 

1.9  B.  Amplitude  distortion.  We  are  now  going  to  discuss  some  cases  of 
nonlinear  operations.  First,  consider  the  so-called  amplitude  distorting  filter 
for  which  the  output  is  a  function  G(x)  of  just  the  present  value  of  the  input 

yt  =  G{xt), 

i.e.,  the  filter  operates  instantaneously.  Assume  that  xt  is  a  real-valued  nor- 
mal stationary  process  with  discrete  time,  mean  zero  and  a  spectral  distri- 
bution function  F(X).  We  have 

g(-Jgy.y.+,-Jf- se  »-o-*>  x 

_oo  2 Tier VI  —  Q 


rt 
xG(x)  G  (y)  dx  dy,     q  = 


ro 


Here  we  have  assumed  that 


f  G2  (x)  -=.- e~¥°idx<oo, 
H.na 


—  00 


so    that  the  new  process  has  finite  variance,  and  that  |  g  \  =  |  rt/o2  \  <  1 

n 

We  have  put  o2  =  r0  =    jdF{X).    But  then 


—  71 
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1  „  .,         oo  /^X  /ni\  x2  +  y2 

e 


e-^a^-^"^=lK  t*  UP-  )?.-** 


VI -q2  v=o     W.    Very 

where   hn  (x)  are  the  orthonormal  Hermite  polynomials  (see  Cramer  [3], 
p.  133) 

/ -I  \n  /    in  x*\     x* 


Hence  Rt  =  2  9*  Qv 

v  =  l 

where  gv=    j  hvl  -  I  G  (x)  d<&  I-  J 

and  the  change  of  the  order  of  operations  is  easily  justified. 

For   the    case  q  =  1  we  note  that  the  formula  still  holds  as  we  then 
get  from  the  Parseval  relation 

00  1  __£'_  oo 


v  =  0 


The  formula  holds  similarly  for  q  =  —  1 . 

Introducing  the  convolutions  of  J1  (A)  with  itself  we  get 

—  n 

(dF(jbi)  is  defined  outside  (  —  n,  n)  by  periodicity).    But 

n 

rnt  =  2  J"  cos  t{xdF*n(iu). 


00 

.2 


As    2  0v  <  °°  and  as  ^*n  (^)  _  F*n  (  —  n)  =  r5  we  see  that 

v  =  0 

oo  xi*n  /  j  \ 

71  =  0  M) 

defines    a    spectral    distribution    function    in    ( —  n,  n)  and  we  can  now 
immediately  verify  that 

n 

Bt  =  .  jeiadF0(l). 

—  71 
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As    we    usually    consider    the    spectrum    as   corresponding  to  the  co- 
variances,  we  should  consider  Rt  —  (EG  (xt))2.    Now 


Eyt=  j  G(x)h0(^d^(^=g0 


so  that  the  spectral  distribution  function  of  yt  is 


2  F*U  W 


JVW-Zfifi— s5--  (!) 

71  =  1  ~o 

Now  this  is  difficult  to  compute  but  we  can  get  good  approximations  in 
the  following  way.  F*n(A.)/ro  is  clearly  the  nth  convolution  of  a  distribution 
function  corresponding  to  a  stochastic  variable  X  in  the  interval  (  —  71,  n) 
reduced  modulo  2ti.  To  get  an  asymptotic  expression  we  assume  that  this 
stochastic  variable  X  has  a  frequency  function  p(X)  which  is  absolutely 
continuous.  Denoting  its  characteristic  function  by  (p(z)  =  EelzX  we  see 
that  the  sum  Xx  +  X2  +  •  •  •  +  Xn  reduced  modulo  2  n  has  the  frequency 

n 

function   nn{x)  =     2    p*n{x  +  2  tzJc),  —  n  <  x  <n.    But    according    to  the 

k=  -n 

Poisson  summation  formula  we  have  then  for  n  >  2 

1  oo 

W.M-H-   2   y"Wc-'".  (2) 


But  as  |  q?  (*)  |  <  C/z,  |  z  |  >  1 

-4-1 


we  find  £  |  <pn  (?)  |  <  0n 


which  tends  to  zero  when  n-^oo  if  A  >  C,  1.  As  the  other  terms  in  (2)  tend 
to  zero  except  the  one  corresponding  to  v  =  0  we  see  that  nn  (x)  con- 
verges uniformly  to  the  constant  1/2  n. 

This  enables  us  to  get  the  approximation  wanted  by  using  instead  of  (1) 

F            (ft  -    V  a2  —  ®  4-  *  +  7t      V     n2 
*  y  appr  W  ~     Z  9n        ~n         r  — 2,      £n 

n  =  l  '0  ^71    n  =  p+i 

-2?n     -4'—    +775^372   jG2We   *-rf*-^-2^- 


n  =  l 


rj         2tt 


o-  (2  tt)< 


■00 


Taking  p  =  1  or  2  would  be  adequate  for  most  purposes. 

Two    cases    of   special   interest   will    be  considered.    First  let  us  deal 
with  a  linear  rectifier 
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Q(u)  =  \ 

10,     u<0. 


Then 

a  a 
9i 

92  = 


2 

<xa 


2Vn 
Using  the  approximation  given  above  we  get 

2  2  n  2      2    Q    i 

_,  ...         &_,,„.  0C  /•   _,  .     .    ,  -r,  , .  .        0C    (7     /,  ~r  7F 

^appr(A)=-^a)+— - 5  /^(^^^a-/.)-----^— - 

a2  c;2  A  +  7T     a2  a2  a2  or2  A  +  tj: 

4tt     2tt        47T  2tz    2tt 

As  a  second  example  consider  a  square  law  device  G(u)  =  <xu2.    Then 

gv  =  0  for  v  4=  0,  2 
^2  =  aa2^2 

and  we  then  get  from   (1)  the  exact  relation 


2^(A)  =  2a2  fF(/i)dF{X-p). 


Compare  with  Rice  [1]. 

Now  let  xt  =  (xf\  x(2))  be  a  stationary  real-valued  normal  process  with 
the  matrix  of  spectral  distribution  functions  F  (X)  =  {Fjk  {X)\  j,  h=l,  2}. 
Consider  the  filter 

where  G  is  the  amplitude  distorting  operator  introduced  above,  and  /  is 
the  identity  filter.  Compute  the  cross  correlation  coefficient  between  G{u) 
and  v  where  u  and  v  have  a  simultaneous  normal  distribution  with  means 
zero,  variances  a2  and  correlation  coefficient  q.  We  get 


EG(u)v  =  EG(u)  E[v  \u]  =  EG{u)  uq 

==Q]/7T~     \uG{u)e   2a2du  =  Qogv 
V  z  7z  a -on 
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Applying  this  to  the  two  components  of  xt  we  easily  find  the  matrix  of  spec- 
tral distribution  functions  of  the  new  process  yt  —  *3-xt 

[Fa(X)        ^f12w 

Fy  W  = 

\  G 

where  FG(X)  is  given  by  (1)  (see  Bussbang,  [1]).  Note  that  if  G(u)  is  sym- 
metric then  the  two  components  of  the  noise  yt  are  incoherent.  Also  note  that 
a  similar  result  holds  for  all  linear  processes. 

Considering  the  same  process  xt  let  us  now  define  yt  as  the  one  dimensional 
process  x[1}  •  xf\  We  have 

Eyt  =  r1>2(0) 

and 

Eysys+t  =  E  *?>  x%  E  x?  x^t  +  E  *?>  x?lt  E  x?  xfU  ++E  sj»  x?  E  xflt  xf* 
=  rlt 2  (t)  rlit{-t)  +  rt  i  (t)  r2i 2  (t)  +  K 2  (0)]2. 

The  covariances  of  yt  are  then  given  by  the  two  first  terms  of  this  expression. 
The  spectral  distribution  function  of  yt  can  then  be  written  in  terms  of 
convolutions 

71  71 

FyW=   fF12(v)dF2ia-t*)+  JFuWdFna-fji). 

~7l  —71 

1.9  C.  Rounding  off.  Consider  a  normal  stationary  process  xt  with  spectral 
distribution  function  F  {X).  After  rounding  off  we  obtain  a  new  process  yt.  If 
the  rounding  off  unit  is  h  we  have  yt  =  G[xt],  where  G(x)  =  vh  if  vh<  x  < 
(v  +  l)h.  We  are  interested  in  the  change  of  the  spectrum  caused  by  this 
operation . 

If  Fr  (X)  is  the  spectral  distribution  function  of  the  new  process  we  have 
(see  formula  (1.9B  1)) 

FrW=Igl^-^  (3) 

fc  =  l  ro 


with 

■00 


gk=   jhk(^\0(x)d^(-) 


and  r0  =  <72  =  jdF(X). 

—  71 
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h 


Writing  G  (x)  =  x  —  -  +  Ah  (x)  we  get 


Now 


where 


Hence 


gk  =  obkl+   j  Ah(x)hkl-\d®(- 


j  &h(x)g(x)dx=  -    2       j     [v-{v  +  i)h]\g((v+l)h)  + 

-oo  r=-oo      vh 

+  (x-(v  +  i)  h)  g'  ((v  +  i)h)  +  Rv]dx 

(x-(v  +  l)h)* 
Rv- g    {{v+v)h). 

9k  =  aakl-—     2    [9'((v+i)h)  +  o(l)], 

i-Z  v=  -oo 


with 


9  (x)  =  hk 


x\         \a> 


g!      dx 


Now  it  is  easy  to  see  that 

9k  =  <rdki  +  o(h2),  k=l,  2,  ...  . 

The   order  terms  o  (h2)  do  not  hold  uniformly  in  Jc,  so  that  we  cannot 
conclude  that  Fr  (X)  —  F  (X)  =  o  (h2).    In  fact  we  know  that 

jdF,(i)=  \dF(X)  +  \-(\  +  o(\)), 


—  n 


■n 


12 


which  is  simply  Sheppard's  correction  (see,  e.g.,  Cramer  [3],  pp.  361-2). 
But  for  any  pO  we  have  by  integrating  (3) 

n  oo  oo  j  2 

jdFrW=  Igt  =  a2  +  o(h2)+  2<7t=cr0  +  ^(l  +  o(l)) 
_„  fe=i  p+i  1Z 


and  we  get 


-  2   a2 


1^  =  ^(1  +  0(1)). 


p+1 


Using  the  limit  relation  proved  in  section  1.9  B  we  see  that 
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JV  (A) -*(*)  +  £  ^(l  +  o(l)). 

This  means  that  by  rounding  off  we  superimpose  incoherent  white  noise  of 
spectral  density  A2/24  n  on  the  original  process  for  small  values  of  the 
rounding  off  unit  h.  See  also  Tukey  [1]. 

1.9  D.  Periodic  sampling.  Let  xt,  —  oo  <  t  <  oo,  be  a  weakly  stationary 
process  with  a  spectral  density  /(A).  In  many  cases  it  is  not  possible  or  con- 
venient to  observe  xt  for  each  t  €  T,  but  only  at  some  points  t„.  If  these  are 
chosen  as  lattice  points  tv  =vh  we  say  that  xt  has  been  sampled  periodically. 
Denoting  the  covariances  of  the  new  process  by  rvh  we  have 

oo  n 

rvh  =  Exsxs+vh  =    j'eivMf(X)dX=  j  eivA fh(X)dX, 

—  00  —71 

in.      l   £  JX  +  2nn\ 
where  fh  (/.)  =  -  2/1 7 I,    -rc<X<:r. 

The  sum  converges  for  almost  every  X  to  an  integrable  function  in  (  —  tz,  n). 
Hence  the  new  spectral  density  is  fh(X).  The  spectral  density  corresponding 
to  a  certain  X  is  put  together  from  contributions  from  the  frequencies 
X/h  +  2  n  n/h.  If  h  is  small  only  the  term  for  n  =  0  matters  but  otherwise  we 
note  that  generally  it  is  not  possible  to  reconstruct  /  (X)  completely  even  if 
we  know  fh{X)  exactly.  Hence  if  we  want  to  use  periodic  sampling  to  get 
information  concerning  the  spectrum  we  must  choose  h  so  small  that  the 
range  of  frequencies  we  believe  are  significant  are  included  in  the  interval 
(  —  jc/hy  7t/h).  In  some  cases  of  interest,  however,  random  noise  originating 
perhaps  from  rounding  off  errors  or  from  the  physical  device  used  masks  the 
higher  frequencies  so  that  little  is  gained  by  making  h  very  small.  Note 
that  if  the  spectrum  of  the  process  (assumed  real-valued)  is  included  in  the 
range  (  —  n/h,  n/h)  the  spectrum  is  determined  by  the  covariances  rvh. 
If  the  process  and  its  first  k  derivatives  are  observed  at  points  vh  the  spec- 
trum can  be  reconstructed  from  the  covariance  structure  if  it  is  in  the 
range  (■  —  n{k  +  l)/h,  n(k  +  l)/h). 

If  we  observe  instead  certain  mean  values  taken  at  periodic  time  points, 

"'(v-i)h 

we  find  easily  that  the  ^-process  has  the  spectral  density 
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4  A     °° 

fy  W  =  -  sin2  -      2 


/ 


h 


n  =  -oo 


'A  +  2?^ 
(X  +  2nnf 


1.9  E.  Random  sampling.  Sometimes  we  may  sample  the  process  at  time 
points  t  chosen  at  random.  Let  xt  be  a  real- valued  stationary  process. 
Consider  a  sequence  ...z_l5  z0  zlt  ...  of  independent  positive  stochastic  vari- 
ables with  the  same  distribution.  Put 

tv  =  tv_x  +  zVi  v  =  •  -  -  —  1 ,  0,  1 ,  . . .  . 

To  determine  the  probability  distributions  of  tv  we  either  fix  t0  as  0  say, 
or  we  can  let  it  have  some  arbitrary  probability  distribution.  Then 


Vv  =  %tv,  v 


1,0,  1,  ... 


is  also  stationary,  with  the  covariances 

Eyvyfl  =  Extvxtpi  =  E[E (xtvxtfi\t„  ^  fixed)]  =  Ertv_tfji. 


But  as 


n 


v-h~  / 


e*<v-v* 


dF{X) 


OO 


we  get 


Ey*y*=  j  [yWf  ^dFW, 


■oo 


where  (p{X)  is  the  characteristic  function  y  {X)  =  E eXz . 
To  determine  the  spectrum  of  the  new  process  define 


—  71    I 


1  —   a 


a  e 


-iZ  12 


dZ 


(4) 


if  \a\  <  1,  and  Pi(a)  as  the  distribution  function  on  (  —  n,  n)  having  all  its 
mass  at  the  point  X  =0  where  a  =  eld,  —  n  <0<  n.  In  any  case  Pi  (a)  is  a 
distribution  function  in  A.  Also 


jeMdPx(a) 


av        v>0 
a1"1,    v<0. 


We  can  always  define  a  bounded  nondecreasing  function 

oo 

G{X)=    j  Pi[<p(x)]dF{x). 
58 
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G  (X)  is  the  spectral  distribution  function  of  yv  as  for  v  >  0 


71  OO 


\  eivk  dG(X)  =  j     /  eM  d  P,  [q>  (x)]  d  F  (x) 


—  n    —  oo 
oo 


=    /  [(p(x)]"dF(x)  =  Eyk+,yk. 

—  00 

If   z  =  h£i    where  E£=\,  it  is  of  interest  to  consider  G(hX)  and  its 
limit  when  A-»0.    Consider  the  characteristic  function 

n/h  n 

j   emd*G(?ih)=   jeiy"lhdG(k) 

—  n/h  —7i 

of  G(Xh).    Letting  r  =  y/h  be  an  integer  we  see  that 

71  OO 

J  eiy*lhdG{X)=   j  [(p(xh)YlhdF(x). 

—  71  —00 

But  (p  (xh)=  1  +  ixh  +  o  (h)  when  |x|</?T1/2.   On  letting  r^oo,  the  lim- 
iting value  is 

00 

jeiyxdF(x). 

—  oo 

But  we  get  the  same  limiting  value  no  matter  how  h->0.    If  y/(r+  1)  < 
h<y/r 


71 


\{ein-eiy*lh)dG{l) 


—  n 


<  ]\X\dG(X). 


But  G(X)  tends  to  the  distribution  function  with  all  its  mass  at  zero  when 
A-^0  so  that  the  difference  tends  to  zero  and  the  same  limiting  value  is 
obtained  no  matter  how  h->0.  The  limit  of  the  characteristic  function 
of  G(hX)  is  thus  the  characteristic  function  of  F{X).  Thus  G(hX)  approaches 
F(X)  as  ^^0  as  one  might  expect  intuitively. 

1.10.  Harmonizable  Stochastic  Processes 

Although  we  are  mainly  concerned  with  stationary  processes  in  this  book 
we  will  just  briefly  discuss  a  more  general  class  of  processes. 

Suppose  that  z(s)  is  an  additive  stochastic  set  function  defined  for  each 
bounded  Borel  set  on  the  real  axis,  not  necessarily  orthogonal,  but  such 
that 
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Ez(x)=0  ] 

Ez{Sl)z~{sJ=  j     j  dF(lfi)\  (1) 

where  F(X,  [X)  is  a  function  of  bounded  variation.  Also  we  assume  that 
F(X,  ju)  defines  a  nonnegative  definite  set  function,  i.e., 

oo 

j  jc{X)c'{/u)dF(X,  ju)>0 

—  00 

for  each  continuous  function  c(X)  vanishing  outside  some  finite  interval. 
Then  one  can  define  integrals  of  the  type 

oo 

xt=    jeiadz(X)  (2) 

—  oo 

and  show  that  such  a  process  has  the  covariance  function 

00 

rs,t  =  Exsxt  =  j  je^-^dFH  p).  (3) 


•oo 


Conversely  if  a  process  has  a  covariance  of  the  type  in  (3),  where  F (X,  (j)  is 
nonnegative  definite,  one  can  show  that  the  process  can  be  represented  in 
the  form  (2),  with  z(s)  having  the  properties  (1).  These  processes  are  called 
harmonizable.  The  weakly  stationary  processes  form  a  subset  of  this  class 
and  correspond  to  F(X,  /jl)  having  all  its  mass  concentrated  on  the  main 
diagonal  X  =  ju.  For  a  more  detailed  study  of  these  processes  see  Loeve  [1] 
and  Cramer  [4]. 
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CHAPTER  2 

STATISTICAL  QUESTIONS  WHEN  THE  SPECTRUM 
IS  KNOWN  (LEAST  SQUARES  THEORY) 

2.0.  Introduction 

This  chapter  is  devoted  to  linear  problems  when  the  spectrum  (or  the 
covariance  function)  of  the  stationary  process  is  known;  we  especially  study 
the  question  of  how  to  construct  the  optimal  linear  predictor.  Our  exposi- 
tion is  based  upon  the  original  paper  by  Kolmogoroff  [2]  and  Doobs'  book 
[2].  Wiener  [1]  should  be  consulted  for  a  somewhat  different  approch  via 
generalized  harmonic  analysis.  The  two  methods  are  equivalent  for  many 
practical  purposes.  A  third  approach  using  the  theory  of  Toeplitz  forms  is  also 
possible  (see  Grenander  [3]). 

This  chapter  is  somewhat  more  formal  than  the  rest  of  the  book  and  uses 
mathematical  machinery  that  may  be  unfamiliar  to  the  reader.  To  make 
the  exposition  complete  in  this  respect  we  have  added  an  Appendix  where 
the  reader  will  find  statements  and  proofs  of  some  results  in  complex  vari- 
able theory  used  in  this  chapter. 

It  should  be  noted  that  we  seldom  know  the  spectrum  a  priori  and  usually 
have  to  estimate  it  from  observed  realisations  of  the  process.  How  this 
should  be  done  is  studied  in  Chapters  4  and  8. 

2.1.  Preliminaries 

Let  xt,  Ext  =  0,  t£T,  be  a  weakly  stationary  (see  section  1.5)  stochastic 
process.  In  many  statistical  contexts,  a  set  of  stochastic  variables  xt, 
t£l <^T,  has  been  observed  and  one  wishes  to  estimate  a  stochastic  variable 
xt,  t€T  —  I,  by  a  linear  combination  of  the  observed  stochastic  variables 
that  is  best  in  the  sense  of  smallest  mean  square  error.  This  problem  can  be 
rephrased  in  a  convenient  geometric  language. 

Consider  the  set  of  all  finite  linear  combinations  ^cvxt ,  tv€T,  where  the  cv 

are  complex  constants.  This  is  a  linear  manifold  M,  that  is,  given  elements 
y,  z£M  and  complex  numbers  a,  f$ 

OLy  +  j$z 

is  also  in  M .  Add  to  M  all  stochastic  variables  that  are  limits  in  the  mean 
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square  of  elements  of  M.  We  then  obtain  the  closed  linear  manifold  Jf\.  Let 
the  inner  product  of  two  elements  x,  yEjfl  be 

(x,  y)  =  Exy. 

The  length  of  an  element  x£jfl  can  then  be  defined  as 


1^11  =  1/^1^1- 


Clearly  the  length  of  an  element  x  is  zero  if  and  only  if  x  =  0  with  probability 
one.  Random  variables  equal  with  probability  one  are  to  be  identified  with 
each  other. 

The  inner  product  (x,  y)  is  a  linear  function  of  x  for  fixed  y  and 


(%,  y)  =  (y,  %)• 
The  Schwarz  inequality  implies  that 

\{x,  y)\<\\x\\  •  \\y\\. 

The  distance  between  two  elements  x,  y  is  defined  as 

This   distance  satisfies  the  triangle  inequality,  that  is,  given  the  elements 

%,y 

\\x  +  y\\<\\x\\+\\y 


This  is  simply  Minkowski's  inequality  (see  e.g.  Titchmarsh  [1]).  We  then  have 
a  linear  vector  space  Jfl  that  is  complete  in  the  norm.  This  means  that  if 
xn£jfl  and  is  a  Cauchy  sequence,  that  is,  \\xn  —  xm\\^0  when  n,  w->oo, 
then  the  limit  in  the  mean  square  of  xn  belongs  also  to  Jfl  as  is  easily  seen. 
Consider  the  closed  linear  manifold  Jl  generated  by  the  stochastic  variables 
xt,  t£I  c:  T.  This  is  a  linear  vector  space  complete  in  the  norm  and  is  a  sub- 
space  of  Jfl.  The  case  of  greatest  interest  is  that  in  which  Jl  is  a  proper  sub- 
space  of  Jfl,  that  is,  Jl^  Jfl.  We  shall  refer  to  J[  as  the  observed  space.  We 
wish  to  estimate  an  element  xtjfl  but  not  necessarily  in  Jl  by  an  element 
xe£Jl  which  is  best  in  the  sense  of  smallest  mean  square  error,  that  is, 

||  x  —  xe  ||  =min  ||  x  —  y  \\. 
yen 

Let  d  =  min  ||  x  —  y  ||.  We  shall  show  that  there  is  an  element  xeEJl  such 
that 


x-xe    =<5. 
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Let  yn  be  a  sequence  of  elements  of  71  such  that  \\x  —  yn\\->d.  It  follows 
from   the   parallellogram   law    (for   any   x,   ytTft,  \\x  -  y\\2  +||  x  +  y" 
2||x||2  +  2||2/||2)  that 

\\yn-ym\Y  =  ^yn~x\Y  +  2\\ym-x\\^^\(yn^ym)-x\\\ 
But  \{yn  +  ym)^yi  so  that 

and  ||2/n-2/?n||2<2||2/ri-^||2  +  2||2/m-a:||2-4^ 


As  w,  m^oo  the  right  side  of  this  inequality  tends  to  zero  so  that  yn  is  a 
Cauchy  sequence.  Because  of  the  completeness  of  ffl  and  71  there  is  an 
xe£jl  such  that  yn^>xe  and  by  continuity  of  the  norm  ||#e  —  #||  =  <5. 

We  shall  also  show  that  x  —  xe  is  orthogonal  to  71.  The  element  xe  +  a.y  G  71 
for  every  y  6  7t  and  every  complex  number  a.  Thus 


and 


x  —  xe  —  a.y\\  >  d 


• 


0  <  ||  x  —  xe  —  cay  ||2  — 1|  x  —  xe 
=  —5L{x-xe,y)  -a(y,  x-xe)  +|a|2-  ||y||2. 

Let  a  =  ft  (x  —  xe,  y)  where  ^  is  a  real  number.  Then 

0<  -2P\{x-xe,y)\*+^\{x-xe,y)Y'\\y\\\ 

This  cannot  be  true  for  small  positive  values  of  ft  unless  (x  —  xe,  y)  =  0. 
Thus  x  —  xelfl.  This  implies  that  any  x  €7H  can  be  written  in  the  fornix  = 
zx  +z2  where  z^^ETl,  z2±yi.  It  is  easily  seen  that  this  representation  is 
unique.  The  element  z1  =  xe  of  71  is  called  the  projection  of  x  onli  and  is  the 

unique  element  of  71  such  that  ||  x  —  zx  ||  =  min  ||  #  —  ?/ 1|. 

yen 

In  some  problems,  it  is  convenient  to  introduce  another  linear  vector  space 
isomorphic  to  the  space  Jfl  generated  by  the  stochastic  process  xt.  By  an 
isomorphism  we  mean  a  mapping  that  takes  each  element  of  one  space  into 
a  corresponding  element  of  the  second  and  vice  versa,  that  is  linear,  and 
that  preserves  the  norm  of  the  elements  mapped.  In  heuristic  language  we 
could  say  that  these  spaces  are  two  different  representations  of  the  same 
geometry.  The  isomorphism  is  here  considered  for  a  process  with  an  integral- 
valued  parameter  but  it  is  quite  analogous  for  other  parameter  sets.  This 
isomorphism  is  implied  by  the  spectral  representation  of  a  stationary 
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process  obtained  in  section  1.6.  Using  this  representation  we  see  that  Jfl 
consists  of  elements  of  the  form 


=  /     2  cteiadz(X), 


x 

t=  -n 


where    n    is    finite    and    the    c's    are    complex   numbers,    as  well  as  all 
limits  in  the  mean  of  these  elements.    Since 


x^~ 


E\x\2=  j  |    2  ctein\2dF(X), 


-n    t=~n 


where    F(X)    is    the    spectral    distribution    function    of    the    process,    it 
follows  that  JH  consists  of  all  elements  of  the  form 

71 

j  g(X)dz(X), 

-71 
71 

where  j  \g{X)  |2  dF(X)  <  °o, 

—  71 

Consider  the  space  L2(dF)  of  all  functions  g(X)  quadratically  integrable 
with  respect  to  dF(X).  Functions  g(X)  differing  only  on  a  set  of  ./^-measure 
zero  are  to  be  identified  with  each  other.  The  inner  product  of  two  elements 
h,  g€L2(dF)  is  defined  as 


(h,g)=  \h{X)g{X)dF{X). 

-71 

L2(dF)  is  a  linear  vector  space  complete  in  the  norm 


II  g  II =V  (9,9)- 

71 

If  x  =jh(X)dz(2.)  is  an  element  of  Jfl,  it  is  clear  that  a  1-1  correspondence 

—  71 

is  set  up  between  the  elements  of  JH  and  L2  (d F)  by  making  x€7H  correspond  to 
h  (X)  G  L2  (d  F) .  In  particular,  xn  corresponds  to  el  n  x.  The  mapping  is  obviously 
linear.  The  inner  product  is  left  invariant  under  the  mapping  since 


(x,y)  =  Exy=  j  h(X)g(X)dF(X)  =  (h,g), 

-71 
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n  n 

where   x  =jh(X)dz(X),   y  =jg(X)dz(k)    are  any  two  elements  of  Jt\.  This 

-n  —n 

correspondence  is  thus  an  isomorphism  and  the  two  spaces  have  the  same 
geometrical  character.  The  problems  of  this  chapter  are  geometrical  problems 
since  they  are  least  square  problems.  When  considering  a  least  squares 
problem  in  ffl,  it  may  sometimes  be  more  convenient  to  solve  the  corresponding 
problem  in  L2(dF).  The  solution  of  the  problem  in  772  will  be  the  isomorph 
of  the  solution  of  the  corresponding  problem  in  ~L2(dF). 

We  now  derive  a  simple  and  useful  inequality.  Suppose  that  the  spaces 
L2{dF),  L2{dG)  are  given  with  0<  dG{X) <  dF(X).  Let  1t(dF),  7l(dG)  be 
the  closed  linear  submanifolds  oiL2(dF)  and  L2(dG),  respectively,  gener- 
ated by  the  set  {el  nA;  n£l}  where  /  is  an  index  set.  Let  the  function  h  (A)  be 
an  element  of  both  L2(dG),  L2(dF).  Consider  the  errors  of  approximation 

o%=     inf      f  \h{X)-a{X)\2dF(X) 

aen(dF) 


71 


a%  =     inf      /  \h{X)-a{X)\*dG{X). 

aen(dG)    ' 


-  n 


tH(dF)cztn(dG)  since  dG{X)<dF{X).    It  is  then  clear  that 

crGScrF, 

since  the  function  minimizing  the  error  of  approximation  in  L2(dF)  gives 
an  error  which  cannot  be  less  than  that  resulting  when  it  is  used  as  an  ap- 
proximation in  L2(dG). 

2.2.  Prediction 

Consider  a  weakly  stationary  process  xt,  Ext  =  0,  i  =...,—  1,  0,  1,  ...  . 
Assume  that  we  have  observed  xn_m,  xn_m+1,  ...,  xn,  m>0,  and  that  we 
wish  to  estimate  xn+v,  v  >  0,  by  a  linear  combination  of  the  observed  sto- 
chastic variables  that  is  best  in  the  sense  of  smallest  mean  square  error. 
We  want  to  predict  v  steps  ahead.  We  shall  not  consider  this  problem  but 
rather  the  limiting  one  in  which  m  =  +  oo.  It  will  be  shown  that  the  solution 
given  is  the  limit  in  the  appropriate  sense  (mean  square)  of  the  solution  for 
large  m  when  m-^oo. 

Let  JH  {...}  denote  the  closed  linear  manifold  generated  by  the  elements 
in  braces.  The  symbols  f|  and  U  are  used  to  denote  intersection  and  union 
of  sets  respectively.  For  convenience,  we  introduce  the  following  linear 
manifolds 
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THm,n  =  7H{xt\  n-m<t<n} 
7nn  =  7n{xt;  t<n)  n7n  =  {ein;  t<n} 


oo 


7Woo=     (1    Win  -oo7W=     fl    nW 

-00  -00 

7ttoo=7Mfe;      -CX)<<<00}  oo7W  =  W{e"*;      -00<Z<Oo}. 


From  the  comments  of  section  2.1  it  is  clear  that  the  solution  of  the  first 
prediction  problem  mentioned  above  is  the  projection  zm  of  xn+v  on  the 
manifold  7Hm<n  and  the  solution  of  the  limiting  problem  is  the  projection 
z  of  xn+v  on  the  manifold  *\\[n.  The  corresponding  errors  of  prediction  are 


%m       Xn+V\\  min    ||?/       %n  +  v  \\       0m 


z  -  xn+v  ||  =  min  \\y -  xn+v  \\  =  d. 

2/eWn 

The  following  brief  discussion  shows  that  the  prediction  error  dm^dasm^>oo 
and  that  the  predictor  zm^>z  in  the  mean  square  asm-^oo.  Now  7Hmfnc-ltlm',n 
if  m<m'  and  7Hn=  U  Jtlm,n-  There  is  then  a  sequence  of  points  ym^7Hm,n 

m 

such  that  ||  ym  —  z||->0  as  ra^oo.  But  then 

II  ym-xn+v  H  -+d 

as  ra-^co.  Thus  dm^d  as  ra-^oo  and  an  argument  of  section  2.1  implies 
that 

II2  —  zm\\  "-^0     as  m->oo. 

We  shall  now  concentrate  on  the  limiting  prediction  problem  where 
m  =  oo  and  discuss  it  in  some  detail.  Let  cpn>v  denote  the  predictor  of  xn+v 
given  xn,  xn_x,  . . .  and  let  <£„  (A)  correspond  to  (p0tV  in  the  isomorphism  between 
TMoo  and  oo 7)1.  But  then  einA</>„(A)  corresponds  to  <pntV.  The  error  of  predic- 
tion 

n 

o2v  =  E\xn+v-<pniV\2=  j  \eivX-<f>v(X)\2dF(X). 

—  71 

The  error  of  prediction  does  not  decrease  when  we  predict  further  ahead,  that  is 

0<o\<ol<  •••  . 

This  is  so  because  7Tln+1^>  Ttln.  But  then  it  can  be  seen  that  the  errors  of 
prediction  a1  are  either  all  zero  or  all  positive.  When  the  prediction  errors  are 
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zero,  the  manifolds  Tfl n  are  all  the  same  and  equal  to  the  manifold  7/too 
generated  by  the  process  xn.  A  positive  error  of  prediction  implies  that  each 
manifold  JYln  is  a  proper  subspace  of  the  next  manifold  JYln+1. 

We  first  consider  a  process  with  an  absolutely  continuous  spectral  distri- 
bution function  so  that 

F(X)=  j  f(x)dxi 

—  n 

where  f{?t)  =  F'(X).    We  also  assume  that 

n 

j  log  f(X)dX>  -oo. 

—  n 

The  predictor  will  be  obtained  and  the  prediction  error  will  be  shown  to  be 
positive.  Since  \  log  /(A)  is  integrable,  it  has  a  Fourier  development 

-OO 
1  71 

where  an  =- —    f  log  f{X)  e~inXdX,  a_n  =an. 

.  4:71      J 

—  n 

Consider  the  function 

00 

g(z)  =  a0  +  2  ^anzn. 

i 

The  function  g  (z)  is  analytic  in  the  unit  circle  |  z  \  <  1   and  its  real  part 

oo 

Reg{reiX)=  2^nr{nleinX 


-oo 


is    the    harmonic    function  with  boundary  value  |  log  /(A).    This  state- 
ment is  made  more  precise  later  on.    Let 

00 

C(Z)=    ^CjZ\ 
j=0 

oo  

where  2  % zf  =  e9(2)  fan,     \z\<l. 

Note    that    the   function   c(z)    is  analytic  in  the  unit  circle  |  z  |  <  1   and 
has   no   zeros  there,  since  g(z)  is  analytic  in  the  unit  circle.    Moreover 
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/  \c(re-u)\2dX  =  2jz  j  e 


lilegire1^)  j  <\ 


71  l      i  71 


l-r2 


2r  cos(0-A)  +  r2 


log  f(d)dd  \dX 


71  -.71  -j  2 

<2njf(B)  —  J1_2rco8(J_A)  +  r. 


<^A«i<9 


=  2tt  j  /(0)d0 

—  7T 

(this  is  Jensen's  inequality,  see  e.g.  Hardy,  Littlewood  and  Polya  [1]) 
so  that  c(z)£H2  (see  appendix  on  complex  variable  theory).  The 
boundary  value 


c(e~a)=  2  c,e~m=  lim  c(re~%       2  kf<oo. 

;  =  0  r->l-  ;  =  o 

exist  almost  everywhere  and  is  such  that 


2tz 


c(e— )2  =  /W. 


Consider  the  function 


eUv  2  cy  c-I;7c  (e~u) 


;  =  »> 


The  fact  that  c  (2)  =+=  0  when  |  z  \  <  1   implies  that  this  function  in  an  ele- 
ment of  07H  as  l/c(z)  is  analytic  in  |z|<l.    On  the  other  hand 


ijX 


1 


c(e"l/) 


is  orthogonal  to  0JH  as  the  inner  product  of  these  two  functions 


/  2  «*«""■ 


e  (e"")  -  2  c, 


-i;\} 


]  =  V 


]  =  V 


c{e~lA) 


iA\  12 


/(A)dA 


™   00  /  00  \ 


-JT     -7  =  " 


j  =  v 
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But  then  it  is  clear  that 

^(A)  =  ei,'Aic;e-^7c(e-u). 

j  =  v 

Note  that  the  errors  of  prediction  are  positive  since 

n 

af=   j  \eu-cf>1W\2fa)d2.-cf) 

—  71 

=  27rexp{^   /  log7(A)<Uf>0. 

^  —  n  ' 

We    have    shown    that    if   xt    is  a  process  with  an   absolutely  continuous 

spectral  distribution  function 

2. 

F(X)=   /  f{x)dx, 

-  n 
n 

where  J   log  /(/,)  dX>  —  oo, 

—  71 

then    the    errors    of    prediction    are    positive.    In    particular    the    error    of 
prediction  one  step  ahead  is 

ol  =  2jiexv  M-   j  logf(k)dk\>0. 

\  -  71  ' 

The  predictor 

71 

<Pn,v=   j  ein*<f>v(X)dz(X) 

—  71 

oo 

-71  *  ' 

where  z  (X)  is  the  stochastic  set  function  corresponding  to  the  process  xt  and 


00 


c(« '     )=  2c;-e"i;7=  lim  c(re~a) 
y=0  r-»l- 

Let  us  now  reconsider  the  prediction  problem  assuming  only  that  the 
prediction  error  is  positive  and  see  what  conclusions  we  can  draw  concern- 
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ing  the  structure  of  the  observed  process.  Since  the  prediction  errors  are 
positive,  o1  >  0.  Define  the  stochastic  variables  £n  by 

Since  9?n_ltl  is  the  projection  of  xn  on  7tln-i>  £n  *s  orthogonal  to  7fln-v 
JHn  is  generated  by  7Mn-i  an(l  £n-  The  process  {|n}  is  an  orthonormal 
process 

-^^nbm        On—in' 

Define  the  constants  ym  by 

ym  =  Exn%n_m    (y0  =  a1). 

oo 
■l^et  Wjj  2*  y m  in -my  Vn       %n       ^n • 


m  =  0 


Note  that  ym£n_m  is  the  projection  of  xn  on  the  manifold  7ttn_m  —  7Hn_m_1. 
This  manifold  is  the  orthogonal  complement  of  7Mn-m-im  Jftn-m-  The  proc- 
esses {un}  and  {vn}  are  both  stationary.  The  stochastic  variable  vn€Tfl_00 
and  consequently  it  is  orthogonal  to  the  process  {£n} .  It  is  therefore  ortho- 
gonal to  the  process  {un} .  It  is  clear  from  the  construction  given  above 
that  the  predictor 

00 

(fn-v,v  ~      2-,  Yin  ±n-m    >    Vn. 
m  =  v 

Note  that  if  the  observed  process  were  {un}  instead  of  {xn},  the  cor- 
responding predictor  would  be 

oo 

2*  Ym  in-m- 
m  =  v 

It  is  of  some  interest  to  consider  the  function 

oo 

y  (*)  =  2  yj  zy- 

3  =  0 

This  function  is  analytic  in  the  unit  circle  \z\  <  1  since  2|7;|2  <OQ-  We 
shall  show  that  y(z)  has  no  zeros  in  \z\  <  1.  Assume  that  there  were  a  zero 
z0,  I  z0 1  <  1 .  It  is  clear  that  z0  4=  0  for  if  it  were,  we  would  have  y  (0)  =  y0  = 
a1=0  contrary  to  our  assumption  that  the  error  of  prediction  is  positive. 
Now  consider  the  element 

00 

X  =     2,  Z0  Sn-j' 
;  =  0 
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The  element  x  is  in  Jf\  {uf,  j  <  n}  but  not  in  Ttl{Up  j  <  n  —  1}  .  On  the  other 
hand,  if  z0  were  a  zero  of  y{z),  we  would  have 

Eunx=^yjZJQ  =  0. 

But  this  implies  that  x  is  in  Jfl  {u^  j<  n  —  1},  a  contradiction. 

We  can  also  show  that  log|y(e~*A)  |  is  integrable.  Since  y(z)  is  analytic  in 
|  z  |  <  1  and  has  no  zeros  there,  it  follows  that  log  y  (z)  is  analytic  in  the  unit 
circle  \z\<  1 .  But  then 


^r(0)~j 


log  y  (z)  _ 

7dz,       0<r  <1, 


z 

|*|-r 


and  on  taking  real  parts  of  both  sides,  we  have 

log  |  r  (0)  |  =  ^  J  log  |  r  (r  eiA)  |  ^  A. 

— n 

But  -^-        I       \og\y{en)\dX<™ 

In      J 

|y(ea)|>l 

since  0<  log  y  <  y  when  y>  1  and  y(ezA)  is  quadratically  integrable.  Using 
Fatou's  lemma  we  have 

-§~      J      log|y(eu)|<U<-log|y(0)|+^      J"      log  | y (e' A) | d A 

|y(eu)|<l  |y(eU)|>l 


so  that  log  | y (e*A)  |  is  integrable.  In  particular,  this  implies  that  y  (ea)  is  not 
zero  for  almost  all  X. 

Since  {un} ,  {vn} ,  {£n}  are  stationary  they  have  spectral  representations 


n 

f        in  1     -i  ,  *  v 

-  7i  -  n 

where 


*"  jt>  j%, 

n=  j  ein*dzu(X),  vn=  j  ein"dzv(X),  £n=  J  ein"dz$(X)[. 


E\dzu(X)\2  =  dFu(X),  E\dzv(X)\2  =  dFv(X),  E\dzs(l)\2 


2tz 


The  linear  representation  of  the  process  {un}  in  terms  of  the  process  {£„} 
implies  that  {un}  has  the  spectral  density  (its  spectral  distribution  function 
is  absolutely  continuous) 
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iA\  12 


2tt 


|y(e-iA) 


As  we  have  seen,  this  last  expression  is  positive  almost  everywhere.  But  the 
processes  {un}  and  {vn}  are  orthogonal  so  that  zu  (X)  and  zv  (X)  are  orthogonal, 
Now 

x 
zu(X)  =  j  y(e-^)dz^) 


so  that 


*W=  f  [y(<r111) t*e(f*H d*v(/*)l 


Since  £n£7/?oo  there  is  an  element  <I>  (A)  G  X2  (<i  i^1)  such  that 

n  n 

in=   j  Q>(X)[y(e-n)dzeW  +  dzv(k)]=   j  einl dz^(X). 

—  n  -n 

On  comparing  integrands  we  see  that 

<$>{X)y{e-n)=einX 

for  almost  all  X  (Lebesgue  measure)  while 

<E>(A)=0 

for  almost  all  X  (Fv  measure).  Since  y{erlA)  is  not  zero  almost  everywhere 
(Lebesgue  measure)  the  conditions  on  0(A)  mentioned  above  cannot  be 
satisfied  unless  FV(X)  increases  only  on  a  set  of  Lebesgue  measure  zero.  Call 
the  set  of  increase  of  Fv  S.  But  then  the  spectral  density  of  the  process  xn 


f(X)  =  F'W 


U\  12 


2ji 


y(e~iA) 


The  spectral  distribution  function  of  the  process  vn 


FV(X)  =  F(X)-  j  F'(fx)dfjL 


consists  of  the  singular  part  and  the  jumps  of  F  (X). 
The  error  of  prediction  v  steps  ahead 

;=0 
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is  the  same  for  both  the  x  process  and  the  u  process.  In  particular  a\  = 
yl  >  0.   Since  log|y(z)|   is  a  harmonic  function  in  \z\  <  1,  it  is  clear  that 

log  |?(r«")|  -±  /  Bt_9  Bafll:l_,^  .  log  |y(H^|iO 


2ji  JnR*-2Rr  cos  (6-X)  +  ri 

gAir-2rcL"(r-AT^log|y(e")l<i0 

=  ilog27r+-—    f- — ^ — - 2\og  f{-6)dd 

25  4tt   J   l-2r  cos    0-A  +r2     6/v        ' 


cos(0-A)  +  r2 

2r  cos  (0~-A)  -I  r* 
log  |c(re*;)  | 


by  Fatou's  lemma.  Thus  log  |  c  (rea)/y  (re1  A)  |  is  a  nonnegative  harmonic 
function  in  Izl  <  1.  Now  wn  is  a  process  with  an  absolutely  continuous  spec- 
tral distribution  function 

x 
Fu(k)=  j  f(fi)dfi, 

—  71 

n 

where    J  log  /  (X)  dX>  —  oo.  But  the  error  of  prediction  one  step  ahead  is 

—  n 

2  2  2  ^    i\ 

o-i  =  c0  =  y0  >  0, 

and  so  log  |  c  (Q) fy  (0)  |2  =  0.  A  nonnegative  harmonic  function  which  is  zero 
at  an  interior  point  is  zero  everywhere.  Thus, 

log|  c(z)  |  =  log  | y(z)  | 

when  |  z  \  <  1 .  The  functions  c  (z)  and  y  (z)  can  only  differ  by  a  constant 
factor  eI0C  where  a  is  real.  Since  c  (0)  =  y  (0)  >  0,  it  is  clear  that  this  factor 
is  one,  that  is 

c(z)=y(z) 

and  in  particular  c  (e~lX)  =  y  (e~lX). 

Now 

v-l  n 

Xn-(pn-v,v=    2>,ln-/  =     f   €*<"-'»  (e"A  -  <j>v  (I))  dz  (A) 
>=° 

=  j  ei(n-v)A(ef^-^a))^(^a)+^(A)) 

-« 

-  71 
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so  that 


v-l 


al=   2|c,|2  =  ]\eivX-^v(X)\2dFu{X)  +  j  \eiv* -</>„(*.)  \*dFvW 


y=o 


=  J  |eivA-<MA)|2dlMA). 

The  function  </>„(A)  is  in  the  closed  linear  manifold  0JH  with  weight  function 
dF{X)  and  hence  in  both  manifolds  with  weight  functions  dFu(X),  dFv(X) 
respectively.  From  the  equation  above,  it  follows  that  we  must  have 


00 


V'12«/«:"Ve(e-")      US 

ivX  xes, 

where  S  is  the  set  of  Lebesgue  measure  zero  on  which  FV(X)  increases. 
We  have  thus  seen  that  the  prediction  error  is  positive  if  and  only  if 

n 

/  \o%F'  {X)dX>  -oo. 

—  n 

If  the  prediction  error  is  positive,  the  predictor  is 

n 

(fn,v=  j  einX<f>v{X)dz{X) 


2  Cyfi 


-xjX 


-  /  *™XJ7p*ifd*M- 


where 


„(,-»)  =  J  °*e-"X=  /*»  I™  exp  U-  J  ff  +  "~,'!  log  *" (0)do] 

;=o  r->i-  J4  7t    _  e       -re  l 


—  n 


Then  the  prediction  error 


ol='2\c,\\ 

j  =  0 


In  any  case  the  error  of  prediction  one  step  ahead  is 

g\  =  2tz  exp  U-  j  log  F'(X)dx) 
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In  the  previous  discussion,  a  process  {xn}  with  positive  prediction 
errors  is  decomposed  into  two  processes  {un} ,  {vn}  orthogonal  to  each  other. 
The  process 

and  {{xn,  un,  vn)}  is  a  stationary  vector  process.  Both  the  processes  {un} , 
{vn}  belong  to  THoo-  The  process  un  has  the  same  prediction  errors  of  as 
the  process  xn.  The  manifolds  JH{um;  m<n}  have  only  the  zero  element 
in  common.  A  process  un  such  that   (1  Jf[{um;  m<  n}  =  {0}  is  called  purely 

n 

nondeterministic.  The  process  vn  can  be  seen  to  have  prediction  errors  of  =  0 
so  that  all  the  manifolds  Jfl  {vm;  m<  n}  are  the  same.  //  a  process  vn  is  such 
that  all  the  manifolds  7ti{vm;  m<n)  are  the  same,  it  is  said  to  be  purely 
deterministic. 

From  our  discussion  of  the  prediction  problem,  it  is  clear  that  a  process 
un  is  purely  nondeterministic  if  and  only  if  its  spectral  distribution  function 

Fuh)=  jf{x)dx 
is  absolutely  continuous  and 

n 

j  log  f{X)dX>  -oo. 

—  71 

n 

It  is  also  clear  that    J  \og  f  (X)dX>  —  oo  (/ (A)  integrable)  if  and  only  if 

—  n 

there  is  a  function  a  (z)  G  H2  such  that  a  (z)  4=  0  when  \  z  \  <  1  and 

1 


2tz 


«(^)|2  =  /(A). 


We  can  even  remove  the  restriction  that  a  (z)  N=  0  when  \z\<  1 .  The  theorem 
of  F.  Riesz  (see  appendix  on  complex  variable  theory)  indicates  that  if 
a (z)  does  have  zeros  in  \z\  <  1,  they  can  be  removed  so  as  to  obtain  a 
function  b(z)£H2,  6(z)=l=0  when  |z|<l  and  still  satisfying 

1  !■&(*-")  I1 -/(A). 


2n 


In  the  discussion  of  this  section  a  purely  nondeterministic  process  un  was 
shown  to  have  a  one-sided  moving  average  representation 

00 

^n  =    /,  Cj  Qn-i 
;  =  0 
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in  terms  of  an  orthonormal  process  £n  where  Jf\  {£m;  m  <  n}  =TTl  {um;  m<n} 
The  coefficients  q  are  those  of  the  function 

oo  I    1         ?!     -id    ,  \ 

c(2)=  2  c,J=)/2n  exp    —   J  — ^—  log  /(0)<*0   • 

Note  that  among  all  functions  a  (z)  6  H2  with  a  (0)  >  0  and  such  that 

1 


iX\  |2 

—  i  til  i  f-" 

2jt 


a  (•-")  J~/Wi 


c(z)  is  the  one  with  maximal  absolute  value.  This  can  be  seen  from  some  of 
the  earlier  discussion.  It  can  be  seen  that  the  representation  given  above 
for  a  purely  nondeterministic  process  un  is  unique  except  for  a  factor  of 
absolute  value  one. 

The  discussion  also  implies  that  a  "process  vn  is  purely  deterministic  if  and 
only  if 

n 

j  \ogF'  {X)dX=  -oo. 

—  71 

Thus  vn  may  have  an  absolutely  continuous  spectral  distribution  function 

n 

long  as    J  \ogf{X)d'k=  —  oo. 


as 


—  71 

71 


Let  xn=  j  exnXdz  {X)  be  a  stationary  process  with  positive  prediction  errors, 


i£|dz(A)|2  =  dF(k).  We  shall  show  that  there  is  then  a  unique  decomposi- 
tion of  xn  =  un  +  vn  into  a  purely  nondeterministic  process  {un}  and  a  purely 
deterministic  process  {vn}  orthogonal  to  each  other,  both  belonging  to  JHoc, 
and  such  that  {(xn,  un,  vn)}  is  a  stationary  vector  process.  This  decomposition 
is  the  one  constructed  in  the  discussion  of  the  prediction  problem  (see  Wold 
[1]).  Since  {un}  belongs  to  THoo,  it  follows  that  un  has  the  representation 

71 

un=    )  h  (A,  n)  dz(X). 

—  71 

The  stationarity  of  {(xn,  un,  vn)}  implies  that 


Exnum=  j  einXh{X,m)dF{X)=Exn 


m  ^0 

-71 

71 


=  j  ei(n-mUh(2.,0)dF(X) 

—  71 
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so  that 

n 

j  einK  [h{X7m)^TmKV{^d)]dF  (X)  =  0 

— n 

for  all  n.  But  then 

h(X,m)  =eimXh(X,0) 

almost  everywhere  (Lebesgue  measure)  since  F'  (A)  >  0  almost  everywhere. 
Thus 

n 

um=  j  eim*h{K)dz\X),  h(X)  =  h(X,0). 

—  n 

In    the    same  way,    using  the  fact  that  xn  =  un  +  vn,  we  can  show  that 

n 

vm=  \elmX{\-h{X))dz{X). 

—  71 

The  orthogonality  of  the  processes  {un},   {vn}  implies  that 

71 

j  eimXh{X)(l-h{X))dF{X)  =  0. 

—  71 

The  function  h  (X)  must  be  either  zero  or  one  almost  everywhere  with  respect 
to  F  measure  and  in  particular  almost  everywhere  with  respect  to  Lebesgue 
measure.  Let  S  be  the  set  on  which  h(X)  =  0.  The  set  S  must  be  of  zero  Le- 
besgue measure,  since  un  is  purely  nondeterministic  and  thus  has  a  spectral 
density  which  is  positive  almost  everywhere  (Lebesgue  measure).  The  ab- 
solutely continuous  spectral  distribution  function  of  un  is  thus 

A 

FU{X)=  JF'(x)dx 

—  71 

and  the  spectral  distribution  function  of  vn  is 

FV(X)  =  F(X)-  JF'(x)dx. 


The  set  of  increase  of  FV(X)  is  S.  This  is  the  decomposition  carried  out  in 
the  discussion  of  the  prediction  problem. 

It  is  worthwhile  noting  that  this  decomposition  of  xn  into  a  purely  non- 
deterministic  and  a  purely  deterministic  process  is  no  longer  unique  if  the 
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condition  that  {un},  {vn}  be  in  7/too  is  omitted.  There  are  then  in  general 
an  infinite  number  of  such  decompositions.  The  terminology  purely  non- 
deterministic  and  purely  deterministic  is  very  suggestive  but  should  not  be 
taken  too  literally.  The  problems  considered  in  this  chapter  are  all  linear 
least  squares  problems.  It  is  interesting  to  contrast  the  linear  prediction 
problem  with  the  general  prediction  problem  in  which  there  is  no  restriction 
of  linearity  on  the  estimates.  The  problem  of  predicting  xn+1  given  xn, 
#„_!,  ...  is  that  of  finding  the  quadratically  integrable  stochastic  variable 
f(xn,  #„_!,  ...)  defined  on  the  sample  space  of  xn,  xn_x,  ...  such  that 

■"  I  Xn+1  ~  /  \xn>  Xn-V  •  ■  •  )  I 

is  minimized.  The  set  of  all  quadratically  integrable  f{xnixn_lt  ...)  is  a 
closed  linear  submanifold  of  the  closed  linear  manifold  of  quadratically 
integrable  stochastic  variables  defined  on  the  sample  space  of  the  process 
{xn}.  But  the  unique  minimizing  f(xn,  xn_1,  ...)  must  be  such  that 

xn+\  ~  /  \xn>  xn-l>  •  •  •  ) 

is  orthogonal  to  this  submanifold,  and  so  we  must  have 

f{xn,  xn_1,  . ..)  =  hi  (xn+1 1  xn,  xn_1,  ...). 
Perhaps  one  should  call  a  process  {xn}  such  that 

xn+l  ~  &  \xn+l  I  xn>  xn-\i  ••  •) 

purely  deterministic.  The  process 

xn  =  ein* 

where  |  is  a  stochastic  variable  with  distribution  function  F(X)  would  be 
purely  deterministic  in  this  new  sense.  It  would  be  purely  deterministic 
in  the  linear  sense  if  and  only  if 


/  log  F'(X)dX 


oo 


It  is  important  to  note  that  if  a  process  {xn}  is  normal,  the  solutions  of  the 
linear  least  squares  problem  and  the  corresponding  general  least  squares  problem 
are  identical. 

We  now  consider  the  prediction  problem  in  the  simple  case  of  an  autore- 
gressive  scheme.  Let  {xn}  be  a  stationary  process  with  an  absolutely  con- 
tinuous spectral  distribution  whose  spectral  density 
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1  1 


p 
The  polynomial    2  °i zJ  cannot  vanish  when  |  z  |  =  1  since  /  (A)  is  integrable. 

;'=0 

We  can  suppose  that  all  the  roots  of  the  polynomial  have  modulus 
greater  than  one.    For  if 

V  P-l 

^bjZ}=(z-z0)  2  tyd, 

3  =  0  3=0 

had  the  root  z0  of  modulus  less  than  one,  we  could  consider  instead 
(z  z0  —  1 )    2  ai %j  which  has  the  same  absolute  value  on  |  z  |  =  1  but  one 

;=0 

zero  less  in  |  z  \  <  1 .  By  iterating  this  procedure  we  can  remove  all  roots 
of  modulus  less  than  one  and  preserve  the  absolute  value  of  the  func- 
tion on  |z|  =  l.    Now  xn  has  the  spectral  representation 


xn=  j  einXdz{X). 

—  71 

Consider  the  stochastic  variable 


v 


The  process  {£n}  is  orthonormal  since 


In  J 

—n 


Moreover 


j        n    gt(n-m)A 

2 


Exnlm  =  —   [ cU  =  0 

n  j    ~ 


;  =  0 

when  m>n  since  the  roots  of    2  °i z'  are  °f  modulus  greater  then  one. 
The  stochastic  variable  £n  is  orthogonal  to  7Hn-i>    Thus 

%n  +  l  ~   —  7"  (^1  #n  ~r  "  *  +  Op  Xn-P  +  i)  + 


0  bo 
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where  £n+1  is  orthogonal  to  Jfln-  A  process  with  an  absolutely  continuous 
spectrum  where  /(A)  is  given  by  (1)  is  therefore  an  autoregressive  scheme. 
The  linear  predictor 

<Pn,  i  =       r~  (#i  xn-\-  " '  -\-  Op  Xn^p+i) 

and  the  predictors  cpn iV  can  be  obtained  by  iterating  this  procedure. 
The  error  of  prediction 

«*-  * 

'"FTP' 

I  u0  I 

We  shall  only  state  some  of  the  corresponding  results  for  the  prediction 
problem  when  the  process  xt  is  a  continuous  parameter  weakly  stationary 
process.  A  detailed  discussion  of  some  aspects  of  this  problem  may  be  found 
in  the  papers  of  K.  Karhunen  [1,  2]  and  in  J.  L.  Doob's  book  [2]  on  stochastic 
processes.  The  process  xt  has  the  spectral  representation 


xt  =  j  eiadz{X),Edz(X)dz{^)  =  dXtldF{X). 

—  00 

Let  JHt  as  before  denote 

7nt  =  7n{xr;r<t}, 

where  the  parameters  are  now  continuous.  Given  that  xu,  u<  t,  has  been 
observed,  the  best  linear  least  squares  predictor  q)t,  r  of  xt+r,  r  >  0,  is  the 
projection  of  xt+r  on  JYlt .  The  prediction  error 

a*  =  E\cpt,T-xt+r\2 

and  as  before  either  ar  =  0  for  all  r  >  0  or  aT  >  0  for  all  r  >  0.  The  first  case 
we  again  call  that  of  a  purely  deterministic  process.  If  the  process  xt  has 
positive  prediction  errors  and  is  such  that  D  1W t  ={0},  we  call  it  purely 
nondeterministic.  The  prediction  errors  of  a  process  xt  are  now  positive  if 
and  only  if 


oo 


log  F'  {X) 


•00 


1  +  A 


If  a  process  {xt}  has  positive  prediction  errors,  there  is  a  unique  decomposi- 
tion of  the  process 
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into  a  purely  nondeterministic  process  {ut}  and  a  purely  deterministic 
process  {vt}  orthogonal  to  each  other  where  {(xt,  ut,  vt)}  is  a  stationary  vector 
process  and  ut  and  vt  belong  to  Jfloo-  Here 


oo 


ut=   j  eiadzu(X),  Edzu{X)dzu{[ji)  =  d^dFu(X) 


vt=  j  emdzv{X),  Edzv{X)dzv{/Li)  =  d^dFv(k).. 


—  00 


where  Fu  (A)  =   /  F'  (x)dx 


—  00 


FV(X)  =  F(X)-  j  F'  (x)dx. 


—  00 


Let 


c  (w)  =  K2tt  exp  {  —  - — :    f   — — TK—dA)' 

The  function  c(w)  is  a  function  of  the  complex  variable  w  =  u  +  iv  that 
is  analytic  and  nonzero  in  the  halfplane  v  <  0.    Moreover 


00 


—  00 

for  all  v<0.    The  limit 


c  (%)  =  lim  c(%  +  iv) 

v->0- 


exists  for  allmost  all  u  and 

00 

J    |c  (u  +  iv)  —  c  (u)  \2du^0 


—  00 


as  v-^0  —  .    One  can  show  that 

1 


,     \c(u)\2  =  f(u)  =  F'(u) 

A  71 

almost  everywhere.    The  Fourier  transform 

1     °° 
c*{t)  =  —  J   eituc(u)du 
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of    c  (u)    vanishes    for    t  <  0.     The   process  ut  has  the  one-sided  moving 
average  representation 


ut 


=   J    c*  (t  —  u)  d  |  (m),  J  |  c*  (%)  |2  d  w  <  oo, 


in  terms  of  a  process  £(u)  where  Ed£(u)d£(v)  =  duvdu  and  £(u)  —  £(v)  G 

7M  {wT;  t  <  0  if  w,  ^  <  ^.  Except  for  a  multiplicative  constant  of  absolute 

value  one,  it  can  be  shown  that  this  is  the  unique  one-sided  moving  average 

representation  of  ut  in  terms  of  a  process  |(w)  of  the  type  described  above. 

The  predictor 

t 

<Pt.r=    J    C*  (t  +  T-u)dtj(u)  +  Vt  +  r 

—  00 

and  the  prediction  error  is 

T 

o2r  =  j\c*(s)\2ds. 

0 

Some  discussion  of  multidimensional  linear  least  squares  prediction  can 
be  found  in  J.  L.  Doob's  book  on  stochastic  processes. 

2.3.  Interpolation 

Several  interpolation  problems  are  considered  in  this  section.  At  times 
for  convenience  a  restriction  will  be  imposed  on  the  spectrum  of  the  process 
studied. 

Let  xn  be  a  discrete  parameter  process,  n  =  ••-,—  1,  0,  1,  ...  .  Assume  that 
the  entire  time  series  has  been  observed  except  at  the  time  point  n  =  0. 
Thus  xn,  n  =f=  0,  has  been  observed.  We  would  like  to  estimate  x0  by  that  linear 
combination  of  the  observed  stochastic  variables  which  is  best  in  the  sense 
of  smallest  mean  square  error.  Let  us  assume  that  F(X)  is  absolutely  continu- 
ous and  that  1//(A)  is  integrable,  where  f{X)  is  the  spectral  density  of  the 
process.  The  process  xn  has  the  spectral  representation 


xn=  j  einXdz{X). 


Let  the  linear  estimate  of  x0  be 


4=    /  c(X)dz{X). 
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The  estimate  x%  is  the  projection  of  x0  on  the  manifold  Jfl  {xn;  n  =t=  0}  and 
so  x0  —  x%  is  orthogonal  to  this  manifold,  that  is 

n 

E(x0-xt)xn=  j  (\-c{X))e-inXf{X)dl  =  0,  w=*=0. 

—  n 

It   is    clear   that   c  (A)    will  satisfy  these  conditions  if  and  only  if  it  is 
of  the  form 

where  k  is  a  constant.    But  the  error  of  interpolation 

71 

o2  =  E\x0-x%\2=  j  \\-c{X)\2f(X)dX 

—  71 

=  \kf\  ~=  j  (\~c(X))i(X)dX  =  2jck 

since   c(A)  is  an  element  of  the  closed  linear  manifold  generated  by  einX, 
n  =*=  0,  with  weight  function  f(X).  The  number  k  is  real  and  given  by 

k 


r    dX 


-ifW 

The  error  of  interpolation 

fifW 

A  slightly  more  general  problem  would  be  that  in  which  xn,  n  =  0,  1,  •  •  *,  N 
are  not  observed  and  one  again  wishes  to  obtain  the  best  linear  least  squares 
estimate  of  x0  in  terms  of  the  observed  stochastic  variables.  An  argument 
analogous  to  that  given  above  indicates  that  the  error  of  interpolation 

o*  =  E\x0-x*0\*=  — ^ — •  (2) 


max 

c0=l 


N 

ijX 


;=0 


dX 

fW 


/ 

The  interpolator 

n 

x%=  j  c{X)dz{X), 
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a  m   l 


where  1  -c  (A)  =—  2  cye: 

Here  c0  =  1  and  the  c/s,  0<  j<  N,  are  the  numbers  maximizing  the  expres- 
sion in  the  denominator  of  (2).  If  1//(A)  is  not  integrable,  it  can  be  seen  that 
the  error  of  interpolation  a2  =  0.  We  can  approximate  /(A)  by 


hW 


't(X)  if  /(A)>! 
1    if   'W'4 


A  process  with  spectral  density  /nW  has  positive  error  of  interpolation. 
But  as  N-^oo,  the  error  of  interpolation  tends  to  zero.  However,  the  error 
of  interpolation  for  the  process  {xn}  is  less  than  the  error  for  a  process  with 
spectral  density  fNW-  Thus  o2  =  0. 

The  following  interpolation  problem  might  arise  when  one  is  observing 
a  continuous  parameter  process  at  discrete  time  points.  Let  xt  be  a  continuous 
time  parameter  weakly  stationary  stochastic  process  (  —  oo  <  £  <  oo).  The 
process  has  the  spectral  representation 

00 

xt=  \  eitkdz{l), 

—  00 

00 

where  rt  =  Exr+txr=    j  emdF(X). 

—  oo 

Suppose  that  the  process  has  been  observed  at  the  time  points  n  =•••,  —  1,  0, 
1 ,  . . .  and  that  we  wish  to  estimate  xt  where  t  is  not  an  integer.  Let  the  best 
linear  least  squares  estimate  of  xt  be 

oo 

xf=  /  c{X)dz{X). 

—  oo 

The  function  c(X)  has  period  2n  since  it  is  an  element  of  the  linear  closed 
manifold  generated  by  exnX  where  n  is  integral.  Since  x*  is  the  best  linear 
least  squares  estimate  of  xt  in  terms  of  the  xn,  n  integral,  we  must  have 

oo 

E{xt-xt)xn=   /  [em-cW]e-in*dFW  =  0 

—  00 

for  all  integral  n.    This  can  be  written  in  the  more  convenient  form 
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j  e-inX[eitxdF1(X)-c{k)dF2(X)]  =  0 


2n 


for  all  integral  n,  where 

Ft(X)  =lle2nnit[F{k  +  2nn)  --  F{2nn)] 
F2{?i)=?:[F{l  +  2n7i)-F{2n7z)] 

since  c(k)  has  period  2  n.  But  this  cannot  be  satisfied  unless 

C(/)"e    dF2(X) 
and  the  error  of  interpolation  is  then 


2n 


o>  =  j 


0 


dFiW 
dF2W 


2 


d#(A). 


If  AF1  and  Ai^2  vanish  simultaneously  on  the  same  set  S  then  c{X)  is 
undefined  on  this  set  and  can  be  set  equal  to  any  function  square  integrable 
with  respect  to  dF  on  S  (for  example  a  constant).  The  contribution  to  the 
error  from  this  set  S  is  of  course  zero. 


2.4.  Filtering  of  Stationary  Processes 

The  problem  of  filtering  usually  consists  in  designing  a  filter  to  be  applied 
to  a  process  which  has  been  disturbed  by  noise  so  as  to  recover  the  original 
process  as  closely  as  possible.  Let  us  call  the  undisturbed  process  the  signal  st 
and  the  noise  nt.  The  process  (st,  nt),  t  =•••,  —  1,  0,  1,  ...,  is  supposed  to  be 
stationary  with  a  known  spectral  matrix  F(X). 

Let  us  consider  the  simplest  version  of  this  problem  when  there  is  no 
internal  noise  generated  in  the  filter.  We  now  have  to  decide  what  class 
of  filters  is  most  suitable  for  our  purpose.  Let  us  confine  ourselves  to  linear 
filters.  Among  these  we  could  consider  all  acting  upon  the  past  of  the  input 
st  +  nt.  Sometimes  one  may  use  filters  utilizing  the  whole  series  st  +  nt 
for  —  oo  <  t  <  oo.  A  usual  situation  is  one  in  which  we  have  a  parameter 
class  2a  of  filters  available,  where  a  is  a  finite  dimensional  vector.  We  can 
think  of  st  and  nt  as  voltages  and  of  3a  as  a  passive  network  with  a  finite 
number  of  meshes,  a  corresponding  to  its  variable  elements. 

If  2a  has  the  frequency  response  function  (pa{A.),  then  the  mean  square 
error  of  the  filter  is  given  by 
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n 

o\=  j  dG<x(X)  =  E\st-3«(st  +  nt)\\ 


—  71 


where  Ga(X)  is  the  spectral  distribution  function  of  the  error  st  —  'J0L{st  +  nt). 
But  we  have 

dGa{X)  =\cpa{X)  -  1  \*dFu(X)  +\<pM\*dF22{X)  + 


+  2Re{[<p0L(X)-l]<pa(t.)dF12W}. 
When  the  noise  and  signal  are  incoherent,  the  cross-spectrum  vanishes  and 

°l=  J*  IWaW-l^dF^W  +  lipA^dF^a)}. 

—n 

We  should  choose  a  so  that  g\  is  minimized. 

As  in  prediction  and  interpolation  problems,  one  of  the  crucial  points  in 
filtering  is  that  we  seldom  know  the  spectral  matrix  F(X)  a  priori.  The  question 
of  estimating  a  spectrum  from  empirical  data  will  be  discussed  in  detail  in 
Chapters  4,  5  and  6. 

2.5.  Treatment  of  Linear  Hypotheses  with  Specified  Spectrum 

Let  yt= ■  xt+  mt  be  a  process  observed  at  t  =  1,  2,  . . . ,  n  with  mean  value 
mt  =  Eyt  and  stationary  random  component  xt,  Ex^  =  0.  In  many  cases  one 
assumes  that  the  means  mt  =  Eyt  are  given  as  linear  expressions  in  unknown 
parameters  ylt  y2,  ...,ys  {s  <  n) 

m  =Oy, 

where  m  =  I   ■    I ,        y  =  I   • 

\mj  \ys, 

and  O  is  an  %xs  matrix  of  known  constants  and  of  rank  s.  An  example  is 
that  of  a  polynomial  or  trigonometric  regression.  Denote  the  co variance 
matrix  of  the  process  xt  by  R. 

We  want  to  find  good  linear  estimates  c  =  I    ?  I  of  the  y's.  One  possibility 

is  to  use  the  method  of  least  squares,  that  is,  to  minimize 

(z-Oc)*  (#-Oc) 
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with  respect  to  c.  Let  O*  O  be  a  nonsingular  matrix.  Then 

(x-Oc)*  (x-<t>c)  =  x*(I  -0(0*0)_10*)a;  +  ((O*  0)*c  -  (0*0)~*0*:z)* 

x  ((O*  0)*c  -  (O*  0)~*0*a;) 

so  that  the  Zea«s£  squares  estimate 

cL  =  (0*0)_1  0*#. 

By  (0*0)*  we  mean  the  unique  positive  definite  square  root  of  the  positive 
definite  matrix  0*0,  which  has  been  assumed  to  be  non-singular.  The 
estimate  cL  is  unbiased  since 

EcL  =  (0*0)_1  0*0/  =  y. 

One  might  go  further  and  ask  for  an  estimate  c0  optimal  in  the  sense 
that  among  all  linear  unbiased  estimates  it  has  minimum  variance.  A  linear 
unbiased  estimate  a  with  covariance  matrix  M a  is  said  to  be  minimum 
variance  if  Ma  <  Mb  for  all  linear  unbiased  estimates  b.  We  say  that  Ma  <  Mb 
if  Mb  —  Ma  is  nonnegative  definite. 

Assuming  R  to  be  nonsingular,  we  shall  show  that 

c0  =  (O*i?-1O)1O*.R  ~xx 
is  a  minimum  variance  linear  unbiased  estimate  of  y.  It  is  unbiased  since 

^c0  =  (O*i2-1O)-1O*i?-1O7=y. 
Now  if  c  =  Lx  is  an  unbiased  estimate  of  y,  it  is  clear  that  we  must  have 

ZO=7. 

In  showing  that  c0  is  minimum  variance  we  shall  prove  and  then  make  use 
of  what  might  be  called  a  Schwarz  inequality  for  matrices.  Note  that  the 
covariance  matrix  of  c  =  Lx  is 

LRL*. 

Let  I*,  (j>  be  two  nxs  matrices  such  that  11*  is  nonsingular.  Let  A,  ju  be 
any  two  s  vectors.  Then 

(AZ+^*)(AZ+^*J*>0 

and  expanding  we  have 

Ml*k*  +M<f>[ji*  +^<£*Z*A*  +ju<f>*cf>/bL*>0. 

87 


2.5 

Let  A  =  11*,  B  =  l(f>,  C  =</>*(/).  The  above  inequality  can  then  be  rewritten  as 

XAX*  +XB[i*  +jliB*X*  +fiCjLi*  =  (XA*  +{j,B*A-*){XAi  +tiB*A~*)*  + 

+  (x[C-  B*A~lB]^>0. 

On  setting  X  =  —  juB*A~1,  we  can  see  that 

C>  B*A~!B 

or  <£*<£>  (Z^)*(ZZ*)-1(/#). 

Note  that  there  is  equality  if  and  only  if  there  exist  vectors  X,  /u  one  of  which 
is  not  the  zero  vector  such  that  XI  +  ju<j)*  =  0.  On  setting  <f>  =  R^L*, 
Z=0*iT*  we  find  that 

E(c-y)(c-y)*=LRL* 

=  ^(c0-7)(c0-r)* 

since  L<&  =  I,  proving  that  c0  is  minimum  variance. 

In  the  discussion  above  we  never  made  use  of  the  assumption  that  xt  is 
stationary.  It  is  only  introduced  because  that  is  the  case  of  greatest  interest  for 
us.  The  estimates  c0  are  sometimes  called  Markov  estimates.  One  should 
note  that  if  the  process  is  normal  c0  is  the  maximum  likelihood  estimate  and 
is  efficient.  Tests  and  confidence  regions  based  on  c0  can  then  be  constructed 
in  the  usual  way. 

When  xt  is  white  noise  we  have  B  =  a2 1  so  that  cL  =  c0.  The  least  squares 
estimates  are  strictly  speaking  not  as  good  as  the  estimates  c0  when  the 
process  is  not  white  noise.  On  the  other  hand  the  Markov  estimates  require 
knowledge  of  the  co variances  which  are  in  general  not  given.  We  shall  see 
in  Chapter  7,  however,  that  in  many  practical  cases  the  least  squares  esti- 
mates are  almost  as  good  as  the  Markov  estimates. 

Before  closing  this  section,  we  shall  briefly  discuss  the  existence  of  con- 
sistent estimates  of  regression  coefficients.  We  shall  say  that  an  estimate  is 
consistent  if  it  converges  in  the  mean  square  to  the  true  value  as  n^oo.  For 
simplicity  we  deal  only  with  the  case  s  =  1  so  that  there  is  just  one  unknown 
regression  coefficient  m  to  estimate 

Eyt  =  myt. 
The  regression  variable  (pt  is  assumed  to  be  of  the  Fourier -Stieltjes  form 
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<pt=  j  emdiuW 

—  71 

where  ju(A)  is  of  bounded  variation  in  (  —  n,  n)  but  not  necessarily  real. 
Assume  that  yt,  t  =  —  n,  —  n  +  1,...,  n  —  1,  n,  has  been  observed.  Consider 
the  unbiased  estimate 


Introduce  the  polynomial 


c=    2    ctVt- 


Pn(z)=     2    CtZ* 

t  =  -n 


The  estimate  c  is  unbiased  if 

n  n 

1=   I  c^^    f  Pn(ea)df*(A). 
The  variance  is 

71 

D2[c]=  j  \Pn(ea)\2dFW. 


Dividing   the   integration   interval   ( —  7t,  n)  into  a  sufficiently  fine  grid 
—  n <  AiP) <  A2P)  <  •  •  •  <  ApP)  <ti  we  get  from  the  Schwarz  inequality 


1< 


12 


j  \Pn{ea)\\dM(X) 

-71 

<[T\p-n{ei'lf))\\A^r)\f  +  e 


M(P\  12 


<z\pn(ean\2AF(xr)z 


AF(ftp)) 


+  £, 


where  A/i(A(„p)),  A  F  (X(P))  denote  the  /u,  and  F  mass  in  the  Hh  interval 
and  e  can  be  made  arbitrarily  small  by  taking  a  sufficiently  fine  divi- 
sion.   We  know  that 


a  i?7  (kp)) 


is  nondecreasing  (see  e.g.  Wintner  [1])  when  the  division  is  made  finer. 
Denoting  the  limit  (finite  or  infinite)  by 
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the  Hellinger  integral,  we  see  that 


D*[c) 


> 


H 


This  is  really  the  greatest  lower  bound  of  D2  [c].    To  see  this,  introduce 
the  step  function 


This  can  be  uniformly  approximated  (except  for  small  intervals  around 
discontinuities)  by  trigonometric  polynomials  of  high  order.  As 


j  <pp  {X)dfx  (A)  =  1 

—  71 

n 

j\<pPw\*dPm=H-Pi 

—n 

we  see  that 

inf  i)2[c]  =  ^- 

There  is  a  consistent  estimate  of  the  regression  coefficient  if  and  only  if  the 
Hellinger  integral  of  [a,(X)  with  respect  to  F{X)  diverges.  In  the  case  of  a 
constant  mean  value,  /u  (X)  is  constant  except  for  a  jump  at  A  =  0  and  we 
have  a  consistent  estimate  if  and  only  if  there  is  no  spectral  mass  at  X  =  0. 
We  can  apply  the  solution  given  above  to  the  first  interpolation  problem 
considered  in  section  2.3.  On  setting  ju{X)  =X/2n  we  see  that  the  mini- 
mization problem  is  in  this  case  equivalent  to  minimizing 

E\x0-  2  ctxt\2. 

£  +  0 


The  greatest  lower  bound  is  then 


4tt2 


?  (dxy 


IdF(X) 

which  we  obtained  in  section  2.3  via  another  method.  This  result  is  due  to 
Kolmogoroff  [2]. 
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CHAPTER  3 


STATISTICAL  ANALYSIS  OF  PARAMETRIC  MODELS 


3.0.  Introduction 

From  this  point  on  we  shall  deal  with  statistical  problems  arising  when  the 
co variance  structure  of  the  process  is  not  completely  known.  In  the  present 
chapter  we  discuss  some  of  the  techniques  that  have  been  suggested  in  the 
last  two  or  three  decades  and  that  have  been_.studied  most  intensively. 
These  techniques  have  in  common  the  fact  that  they  are  based  on  the  a 
priori  assumption  of  finite  parameter  models;  usually  a  moving  average  or 
autoregressive  scheme  of  low  order  plus  a  regression  component. 

3.1.  Periodogram  Analysis 

One  of  the  simplest  models  that  has  been  proposed  for  time  series  analysis 
is  the  scheme  of  hidden  periodicities, 

V 

xt  =  2  (av  cos  Avt  +  bt  sin  Xv  t)  +  gt , 

v  =  l 

where  av,  bv  and  Xv  are  real  constants  with  0  <  Xv  <  7t,  and  where  £t  is  pure 
white  noise.  We  wish  to  detect  the  periods  2  n/Xv  that  have  been  masked  by 
the  random  disturbances  £t.  For  this  purpose  the  following  statistic  has  been 
proposed 

2xta-ia 

t  =  l 


Inn 
where 


47C  4:71 


1/2   n 
A(X)  =  V  -   y  xt  cos  tk 
r  n  ,=1 

1/2    n 
B(A)  =  V  -  lxt  sin  U 
7  n  t=1 


(1) 


In  (X)  is  called  the  periodogram  and  is  clearly  suggested  by  a  Fourier  analysis 
treating  the  time  series  as  if  it  were  just  the  undisturbed  trigonometric  sum. 
Writing  xt  in  the  complex  form 


p 

JtL 
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we  get 

Z,  xt  e  ,-  z,    <*„  e  f  a    ^  -t-         ^  ?f  e 

\  n  t=i  Vnv=i  e  J-       |/^  v=i 

The  second  term  has  mean  zero  and  variance  1.  The  first  term  evidently 

tends  to  zero  if  X  =*=  A„,  i>  =  1,  2,  . . .  ^;  otherwise  it  is  of  the  order  Yn.  Hence 
we  can  expect,  for  large  samples,  that  the  periodogram  have  high  peaks  at 
the  frequencies  A„  and  otherwise  be  of  moderate  size.  On  computing  the 
periodogram  one  could  then  infer  that  its  great  peaks  correspond  to  real 
periods. 

Naturally  the  question  of  the  statistical  significance  of  these  periods  arises, 
and  we  shall  mention  a  test  due  to  R.  A.  Fisher  [1].  It  has  been  observed 
that  the  periodogram  often  has  a  very  irregular  appearance  and  many  re- 
search workers  were  tempted  to  attribute  real  meaning  to  the  many  peaks  of 
the  periodogram.  In  Fisher's  test  the  null  hypothesis  is  that  the  process  has 
no  period,  that  is  xt  =  gt,  and  that  the  £'s  are  normally  distributed  with 
unknown  mean  m  and  standard  deviation  a. 

Let  the  number  of  observed  values  be  odd,  say  w  =  2m  +  l,  and  consider 

m  values  of  the  periodogram  at  the  points  lr  =  2  7tr/{2m  +  1),  r  =  1,  2, m. 

Because  of  the  orthogonality  of  the  trigonometric  coefficients  in  (1),  the 
stochastic  variables 

A(lr),  r=l,  2,  ...m\ 
B(lr),  r=\,  2,  ...m) 

are  2  m  independent  normal  variables  with  mean  zero  and  variance  a1. 
Hence 

sr_A2(lr)  +  B2(lr),        _ 

o  —  o  ,      r  —  1,  z,  . . .  m , 

g  a 

are  independent  #2- variables.  The  joint  frequency  function  of  sls  s2,  . . .  sm  is 
then 

m 
-c  £  zv 

cme      1 

for  zv  z2,  ...  zm>  0  and  0  otherwise.  Let  yr  be  the  rth  largest  value  of 
sx,  s2,  ...  sm.  Introduce 

9  = * 

2/1  +  2/2+  '"Vm 
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We  shall  derive  the  characteristic  function  of  the  variable  h  =  l/g.  As  the 
distribution  of  h  does  not  depend  upon  the  scale  factor  c,  we  can  take  it  as  1 . 
We  then  get 


(p(t)  =  Eeith 


oo  oo  oo  j/(  2/i 


d^m 


2/i  =  0     2/2=2/t      2/r  =  2/i    yr  +  l  =  °     2/m  =  0 


oo 


m 


(m  —  r) !  ( 


r-1)!  J 

J/-0 


,r(it-y) 


it-y-im-r 


[l-e"-y] 


2/ J 


m-l 


d«/. 


#  has  an  absolutely  continuous  distribution  with  the  frequency  function 


1 


m 


2  n  (m  —  r) !  (r  — 


oo  oo 

1)1    /  *"V?  J 

i=-oc  j/=0 


,r(it-y) 


it-y-\m-r 


[1  -  e""3'] 


2/J 


m-l 


dy  dt, 


as  the  integrand  is  bounded  by  the  expression 


e-r3/[l  +  e-y] 


2/-i  m-r 


t 


21 


m  -1 


y  j 


which  is  integrable  if  m>2.    Developing  [1  — el*  y]m  r  and  using 


/ 


„i£x 


(Z +■*.*) 


0, 


a^-1,       x>0,     K>0 
x<0,     K>0, 


we  get 


m  —  r 


i==3rM<-H7-;j<-"> 


m-2 


Integrating  this  between  a;  and  l/r  we  obtain 

[l/ar]  /  _  \y-r 


P(g>x) 


m 


2 


(r-l)\  j=r  j{m-j)\  (j-r)\ 


(l-jx) 


m-l 


The  most  important  case  is  that  of  the  greatest  value  of  sl3  s2, 
r  =  1 .  We  then  have 


s m,  i.e. 
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P  (g  >  x)  =  m  ( 1  -  x) 


m 


j    ra(m-l) 


(l-2x)m-i  + 


m(m-l)(m-2)  x 

+  3-2         ~^-6x) 


(2) 


where  the  summation  should  extend  as  long  as  the  terms  inside  the  brackets 
are  positive. 

The  practical  application  of  this  is  as  follows.  Compute  the  values  of  the 

periodogram  slt  s2,  ...  sm  and  g  =  (max  sv)/(s1  +  s2  H sm).  If  g  <  gp,  where 

gp  is  some  appropriate  percentile  of  the  distribution  given  by  (2),  we  accept 
the  hypothesis  that  no  period  is  present  in  the  data.  If  g  >  gp,  we  reject  the 
hypothesis.  Intuitively  one  would  then  believe  that  2  n/lv  is  a  real  period  if 
lv  corresponds  to  max  sv.  Although  this  would  not  be  in  strict  accordance 
with  formal  statistical  theory,  it  is  the  way  one  usually  proceeds  in  practice. 

Once  we  have  decided  the  order  of  the  trigonometric  polynomial  and  its 
frequencies  Xv  one  may  wish  to  estimate  the  coefficients  av,  bv.  As  the  correla- 
tion matrix  of  £t  is  known  (it  is  the  identity  matrix)  we  can  use  section  2.6  to 
get  the  minimum  variance  estimates.  We  have  in  this  case  (where  the  A's 
are  some  of  the  Z„'s) 

0  =  (e"*>;  *  =  1,  2,  ...'n,  j>  =  1,  2,  ...  p)nxp. 


Hence 


0*0 


|  2e-«*ye«**.  jf  k=i}  2,  ...p]       =nl 

I  t  =  \  )  pxp 


and  the  minimum  variance  estimates  are 


1    n 

a*=  -  2  e  '""v%t- 


.-it  2.. 


Confidence  intervals  for  ccv  can  be  obtained  in  the  standard  way  when  £t 
is  normal  pure  white  noise  with  known  or  unknown  variance  (Cramer  [3], 
p.  550). 

3.2.  The  variate  difference  method 

Another  scheme  that  has  been  proposed  is 


xt=  2  <*„  tv  +  St  =  m  +  St, 

v  =  0 

where  the  regression  coefficients  a„  are  unknown.  If  p  is  known,  we  can  use 
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the  minimum  variance  estimates  to  estimate  a„  in  the  same  way  as  we  did 
in  section  3.1. 

However  if  p  is  not  known,  it  has  to  be  estimated  and  for  this  purpose 
the  so-called  variate  difference  method  has  been  suggested  (see  Tintner  [1]). 
Introduce  the  difference  operator  of  order  q 

(q\                  i q\ 
yt=  Aaxt  =  xt  +  Q-[    \xt+a-i+  \    \xt+a-2 (  ~  l)9^- 

We  have  ^Vt  =  A9mf 

and  Z>2  [yt]  =  2  (j) '  •  &  [ft]  =  (*  S)  •  &  [ft]. 

This  is  used  in  the  following  way.  First  the  empirical  variance  of  the  original 
sample  xt  is  computed.  Then  the  first  differences  of  the  series  xt  are  found  and 

their  empirical  variance  is  divided  by  I     I  =2.  Then  the  second  differences 

are  found  and  their  empirical  variance  is  now  divided  by  I     I  =  6,  and  so  on. 

The  series  of  numbers  obtained  after  a  while  (when  we  reach  the  pth  differ- 
ence) should  stabilize  except  for  random  fluctuations.  The  order  of  the 
difference  when  this  happens  can  be  taken  as  an  estimate  of  p  and 


jy  [yj 

(V) 

is  an  estimate  of  the  variance  of  the  disturbance  tjt. 

3.3.  Effect  of  Smoothing  of  Time  Series  (Slutzky's  Theorem) 

In  the  older  literature  on  time  series  analysis  one  very  often  finds  tech- 
niques in  which  the  sample  is  first  smoothed  by  a  series  of  linear  operations  of 
simple  type.  The  smoothing  is  supposed  to  eliminate  the  random  fluctua- 
tions. The  resulting  series  is  then  subjected  to  some  analysis,  say  Fourier 
analysis,  to  isolate  the  periodic  components.  The  danger  of  such  a  procedure 
is  pointed  out  by  a  theorem  of  Slutzky  [1]  (see  also  Moran  [1]). 

Let  L  be  a  finite  linear  operator 

L  ut  =  aQut+v  +  %%+j,-!  +  •  •  •  +  aput 
with  the  frequency  response  function 
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v  =  0 


-iv  X 


corresponding  to  the  filter  L.  Suppose  that  the  process  xt  is  stationary  with 
the  spectral  distribution  function  F(X).  Then  Lxt  is  a  stationary  process 
with  the  spectral  distribution  function 

x 
j\<p(l)\2dF(l). 

-n 


Consider  the  normed  process 


Vt 


Lxt 


D[Lxt] 
Its  spectral  distribution  function  is 


G(X)  = 


S  \f(l)\2dF(l) 

-71 
71 

j  \<p(l)\*dF(l) 


(1) 


Consider  a  finite  linear  operator  M  with  the  frequency  response  function 
m(X).  The  function  |ra(A)|2  is  continuous.  Assume  that  it  has  a  unique 
absolute  maximum,  say  at  X  =  A0.  The  case  of  several  maxima  can  be  treated 
in  a  similar  way.  Put  L  =  Mn  and  assume  that  X  =  A0  is  a  point  in  the  spec- 
trum of  xt.  We  clearly  have  cp(X)  =m(X)n  and  (1)  defines  a  sequence  of 
distribution  functions  in  (  —  n,  n).  Suppose  that 


n  <  X  <  XQ.  Then 


/  \m(X)\2ndF(X)>[F(X0  +  e)-F(X0-s)][\m(X0)\-dfni 


where  e,  d>0.    On  the  other  hand 


/  \m(X)f  dF(X)<[FW-F(-n)]{\ma0)\-8'fn 


—  71 


where  b'  >  d.  If  X0  is  a  point  of  increase  of  the  spectral  distribution  function 
F(X)  so  that  F(X0  +  s)  >  F(X0  —  e),  it  follows  that  G(X)  tends  to  zero  when  n 
tends  to  infinity.  One  can  similarly  show  that  G  (X)  tends  to  one  when  X  >  X0 
and  n  increases.  Hence  the  behaviour  of  yt  over  any  fixed  finite  interval  will 
approximate  that  of  the  harmonic  with  frequency  X0  if  n  is  sufficiently 
large. 
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Slutzky  considers  two  linear  operators, 

Mut  =  ut+i  +  ut  ) 
Nut  =  ut+i  —  ut  ) 

Let  L^M^N"  and  let  ju  and  v  tend  to  infinity  with  v/u  =  ol,  0<a<l. 
Clearly  in  this  case 

<p{X)  =  (l  +  enY(l-en)v. 
The  function  |  <p  {X)  |2  has  an  absolute  maximum  at  X  =  A0,  where 

,       u  —  v     1 —a 

cos  A0  = = • 

fji  +  v     1  +  a 

One  can  now  again  prove  that  the  smoothed  series  converges  to  a  harmonic 
with  frequency  A0  (see  Moran  [1],  p.  76). 

These  results  indicate  that  although  smoothing  can  be  used  to  eliminate 
random  disturbances  it  also  may  introduce  periodicities  which  are  not 
present  in  the  original  process.  It  should  also  be  pointed  out  that  existing 
periods  or  peaks  in  the  spectral  density  may  be  obliterated  by  a  smoothing 
operation. 

3.4.  Serial  Correlation  Coefficients  for  Normal  White  Noise 

A  property  common  to  both  schemes  treated  in  3.1  and  3.2  is  that  the 
disturbance  is  pure  white  noise.  In  most  cases  one  does  not  have  any  a 
priori  knowledge  concerning  this.  A  need  was  felt  for  techniques  to  deter- 
mine whether  a  given  time  series  could  be  regarded  as  a  sample  from  a  pure 
white  noise  process.  In  this  section  we  shall  review  briefly  a  few  of  the  many 
results  in  this  direction. 

Assume  xt  to  be  a  normally  distributed  process  with  unknown  mean  value 
m.  We  want  to  test  the  hypothesis  that  xt  is  white  noise.  It  is  clear  that 
one  intuitively  plausible  statistic  to  consider  is  the  first  serial  correlation 
coefficien    (of  lag  one) 


b  = 


c0 


n-p 


where  cp  =  ^  {xv  —  x)  (xv+p  —  x),  p>0. 


v  =  l 


One  could  consider  serial  correlation  coefficients  of  higher  lag  but  we  shall 
not  discuss  them  here. 
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The  distribution  of  b  under  the  null  hypothesis  can  not  be  obtained  in 
closed  form.  Because  of  this  difficulty  one  is  led  to  consider  instead  the 
circular  serial  correlation  coefficient 

A 

0  =—  » 


where 


v  =  l 


and    we    have    assumed    the    circular    definition  xv+n  =  xv.    Consider  the 
two  quadratic  forms 

\^  ==  x  O  x, 

y  =  x*  V  x, 
where 


0  = 

0 

1 

2 

0 

1 
2 

0 

1 
2 

0...0  j 

i...o  o 

0  ...  0  0 

l 

—  —  < 

n 

1    1  ... 
1    1  ... 

1 
1 

\ 

1 

I  2 

0  0...  i  0 

11    1  ... 

1  J 

ri  i ...  i 

\ 

V  =  I- 

1 

—  < 

n 

l  l  ...  l 

> 

ii  l ...  i 

t 

Now  b  =C/y.  Now  CV  =  V  C  so  that  there  is  an  orthogonal  transformation 

taking  both  C  and  V  into  diagonal  form  at  the  same  time. 

Let  us  now  derive  expressions  for  the  eigenvalues  of  a  circular  matrix 

( 

1      ^2         3    •  •  •         Tl 
dfl     &\     &2    •  •  •    ™7l  —  1 


2      C*»j     C*^    •  •  •    tX-i 


Now  ^1  = 


^  asJs  1,  where 

s=l 


J= 


fo 

1 

..  0    0 

0 

0 

...  0   0 

0 

0  . 

..0    1 

1 

V 

0. 

..0  oj 
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The    eigenvalues    of  J  are  wk  =  e2nlk,n .    The  eigenvalues  of  A  are  then 

n  n  2ni(v- 1)  j 

In  the  first  case  take 


The  eigenvalues  of  C  are 

1    n                                             2nj 
h= —   y  WYX  +  \  (Wj  +  wj1)  =  cos ,        j=  1,  2,  ...  n  —  1 

The  eigenvalues  of   F  are 

/^=1>       7  =  1,2,  ...  n-\  \ 

The  eigenvalues  Xn  and  ^  correspond  to  the  same  eigenvector 

(4,  J-, ...  i\ 

as  is  immediately  verified.    Hence 

r       vS         2tt/> 
C  =  ZVi  cos  — 

where  (ylt  y2,  ...  i/n)  is  the  transform  of  (xlt  x2,  ...  xn)  under  the  orthogonal 
transformation  so  that  they  still  are  independent  normal  variables  with 
mean  zero  and  common  standard  deviation. 

Let  us  now  compute  the  distribution  of  b  when  n  is  odd.  For  n  even  the 
same  method  can  be  used  and  we  refer  to  R.  L.  Anderson  [1]  for  the  result. 
Consider 

P(6>a)=P(C-ay>0). 

The  quadratic  form  Q  =  x*(C  —  cf.V)x  has  the  characteristic  function  (see 
Cramer  [3],  p.  233) 
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<p(t)=\I-2it{C-0LV)\-*=  EI  [l-2t*(A*-a)]-1. 


71-1 

n 

fc=i 


The  eigenvalues  are  equal  in  pairs,  Xk=Xn_k  for  h  =  1,  2,  ...  n  —  1    (see 
formula  (1)).  We  have 


71-1 


?(*)=  II  [l-2»*(A*-a)r1, 


fc-1 


$  then  has  the  frequency  function  (n  >  3) 


n-3 


/M 


2tt 


e-iut<p(t)dt  =  i   2   ea  <«-'*>  (Afc     a)  2    — ^ 


Afc>a 


ttfc  /-fc         OC 


by  a  simple  application  of  the  calculus  of  residues.    We  have  put 

n-l 
2 

«/c=  n  (h-ij)- 

;'=l,;'#fc 


Hence 


n_-3    i 

P(6>a)=     I   f(u)du-    2   Wfc-a)  2  — 

&A; 


a)=J 


Xk>ct 


This  can  be  used  to  get  a  one-  or  two-sided  test  of  significance  for  the  null 
hypothesis.  For  tables  see  R.  L.  Anderson  [1]. 

Madow  [1]  has  derived  the  distribution  of  b  under  the  alternative  hypo- 
thesis that  yt  =  xt  —  m  is  a  first  order  autoregressive  scheme  so  that  its 
frequency  function  is  (again  using  the  circular  definition) 


p(xlt  x2,  ...  xn\A,  B) 


=  const,  exp 


A  2  {xv  —  m,)2  +  2B  2  (xv+\  —  m)(xv  —  m) 


v  =  l 


v=\ 


where  A  =  (1  +  q2)/g2,  B  =  —  q/g2  when  yt  satisfies  the  stochastic  difference 
equation 

yt+i-Qyt  =  Et,  M<1- )    . 

Note  that  the  sample  mean  x  is  independent  of  y,  b  whatever  the  values 
of  A  and  B.  Under  the  null  hypothesis  y  and  b  are  independent.  This  is  most 
easily  seen  by  a  geometric  argument.  The  conditional  distribution  of  b  given 


100 


3.4-3.5 

y  (see  formula  (1))  does  not  depend  on  y  and  the  independence  follows,  y 
has  a  ^-distribution  with  n  —  1  degrees  of  freedom.  Hence  y,  b  have  the 
joint  frequency  function 

n-3     _y  n-5     i 

g  (y,  b)  =  const,  y  2    e   2    2   W*  _  8)  2   — 

under  the  null  hypothesis. 

It  is  clear  that  (b,  y)  is  a  sufficient  statistic  for  (J.,  B)  in  the  sense  that 

p(xltx2,  ...  xn\A,B)  =q(b,y\A,B)  r(xv  x2,  ...  xn\b,  y), 
where  r(x1,  x2,  ...  xn)  does  not  depend  upon  A  and  B.  Hence 

e-is(Ay  +  2Bby) 

q(b,  y  \A,  B)  =  q(b,  y\  1,  0)   const.    - 


t-\y 
-const.  j,Tc-b(^-2BS)  2    (^-gpr_l. 

Integrating  over  0  <  y  <  oo  we  get  the  marginal  frequency  function  of  b 

(A  \n-l  n-5     j 

const.   l-^  +  Bb)   2    2    (A* -8)  2    — . 
which  is  the  desired  expression. 


3.5.  Approximate  Distributions  of  Quadratic  Forms 

The  only  reason  for  dealing  with  the  circular  coefficient  b  instead  of  with 
b  is  that  it  does  not  seem  possible  to  get  an  explicit  expression  for  the  distri- 
bution of  b.  Let  us  compute  the  eigenvalues  of  the  quadratic  form  C  appear- 
ing in  the  numerator  of  the  noncircular  serial  correlation  coefficient.  We  have 

0  i  0...  0  0 

i  0  i  ...  0  0 

G     "   " 


0  0  0...  i  0 
For  simplicity  we  have  assumed  that  the  mean  value  of  xt  is  zero.  Set 


0     0     0...  - 
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As 


An  —   —  AAn_i—  jAn_2 


we  have  An  =  a|"  +  /?£?, 

where  £l5  £2  are  roots  of  the  equation 


(1) 


(2) 


Then 


£2  +  A£+i  =  0. 
A1=-A  =  ^1  +  |2  =  a|1  +  /5|2 

A2  =  A2-i  =  |2  +  li  +  £1|2  =  a£2  +  /?li 


where  we  have  just  used  the  relation  between  roots  and  coefficients  of 
equation  (2).  Then  solving  for  a,  ($  and  using  (1)  we  find 


An 


f-n  +  1        f-n+1 


fi-fi 

As  li  |2  =  i  we  can  define  w  by  setting 

fi-- 
fi-- 

Then  A  =  (w  +  w_1)/2  and  An  =  0,  w>0,  implies  that 


^n  +  l_^-n-l 


w  —  w 


0. 


Thus 


Hence 


w 


2(n+1)=L     »>0. 


Afc  =  cos 


7T 


n+l 


Ki  •     A/  1  j      u; 


w. 


(3) 


Note  that  all  these  eigenvalues  are  different,  in  contrast  to  the  circular 
case  discussed  in  the  last  section.  The  characteristic  function  will  now  not 
reduce  to  a  rational  function  and  we  can  scarcely  hope  for  an  explicit  in- 
version of  it. 

There  are  various  ways  in  which  to  obtain  approximations.  The  simplest 
(and  crudest)  way  is  to  compute  the  first  two  moments  of  the  statistic  and 
use  the  normal  approximation.  A  closer  approximation  has  been  suggested 
by  Koopmans  in  an  important  paper  [1].  He  observed  that  the  eigenvalues 
of  the  form  in  question  were  distributed  asymptotically  as  the  values  of  cos 
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71  x,  0<#<1,  and  used  this  to  approximate  the  characteristic  function. 
Whittle  used  the  same  idea  in  more  complicated  situations  (Whittle  [1]). 
It  has  been  pointed  out  (see  Grenander  [3])  that  this  method  is  an  applica- 
tion of  a  general  theorem  of  Szego  [1]  on  the  asymptotic  distribution  of 
the  eigenvalues  of  Toeplitz  forms,  which  we  are  going  to  prove. 
Consider  a  Toeplitz  matrix 

Tn  ={cv_lx\  v,  ii  =  1,  2,  ...  n}, 

1      *    . 
where  c„  =  - —       elvx  g  (x)  dx. 

V    IT        J 


2n 


-71 


Here    g  (x)    is    a  measurable  bounded  function  |  g  (x)  \  <  M  <  oo .     Denote 
the  eigenvalues  by  Ai,n>  %2,n,  •••  An>n.    The  quadratic  form 


IV 

Tn       X    J.  fi  X  2_,     Cv—[i  Xv  Xjx  —       | 


V,/U  =  l 


2 
/ ,  Xv  6 


ivx 


g{x)  dx 


is  Hermitian  so  that  the  A's  are  real  and  the  norm  of  rn  \  \  rn\  \  <  M  so  that 
|  A*,  n  |  <  M.  For  any  n  we  can  talk  about  the  (discrete)  distribution  of  the 
eigenvalues  in  the  interval  (  —  M ,  M ) . 

Let  a  be  a  real  variable,  |  a  |  <  \/M.  Then  (1/2  n)  [1  +  a  </(#)]  >  0  and  it 
can  be  considered  as  the  spectral  density  of  a  stationary  process  yt,  which 
then  has  the  covariance  matrix 

We  know  that  the  residual  variance 

g2v=    min  E\yt-b1yt_1-b.2yt_2...bvyt_v\2  (4) 

bj.bj... .,  bv 

can  be  expressed  in  terms  of  covariance  determinants  (see  Cramer  [3], 
p.  305) 

2        |*+1 

av  = 


B 


But  1*1=  II  [1+aAy.J. 

On  the  other  hand  (see  section  2.2) 

hm  (t,  =  2ti;  exp  {—       log   <Z:r 

v^oo  (2  7T  __  2  7T  j 


103 


3.5 

As 


r         i  2  r  l     Vl  2  r  log   |i?n  +  1|— log   |  jRx 

lim  log  On  =  lim    -   2,  l°g  <**  =  lmi 


n->oo  n-^oo    71  v  =  \  n-*-oo  71 

1      U 

=  lim    -   2  log  [l  +  aA„,n] 

n— >00      %    y=l 


we  have 

1    n  1      ? 

lim    -  2  log  [1  +  a  Ay,  n]  =  s—  /  log  [I  +  a.g{x)]  d  x. 

n-+co    71  7=1  £  71    J 

J  — ti 

n 

Introducing  $n,P=   2  A£„ 

00  xp 

and  using  log(l  +  #)  =   2(  — 1)P   * — 

'      p-i  V 

which  converges  for  |jc|<1,  we  have  for  |a|<l/Jf 
lim    2  (      ~       -^  ap 

n->oo    p  =  i  ^  71 

vP-1 


=  9-/2  hr-^M"!^ 

2rc_J7r  U=i        V  ) 


ip-i 


oo     /  -i  \p-l      -i 

=  I{ )—V-  j  9P(x)aipdx 

because  of  the  uniform  convergence. 
Set 

p  =  l  2>  W 

oo/  1  \p_1      1  " 

/(*)  =  21— r~-9-  j  gp(x)zpdx. 

For  |  z  |  <  1/2  M  the  sequence  /n  (2)  is  uniformly  bounded.  Further,  for  every 
real  z  in  the  interval  ( —  1/M,  1/M)  fn{z)  converges  to  f(z).  It  then  follows 
from  Vitali's  theorem  (see  Titchmarsh  [1],  p.  168)  that  the  series  converges 
uniformly  in   \z\  <  1/2 M.  Then 

lim   ^  =  J_  f  g'(x)dx,  p=l,2,.... 

n-^.00     %       In  j  ■ 


-n 
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Now  we  have  a  sequence  of  distributions  in  the  finite  interval  (—  M,  M). 
As  the  moments  converge,  the  same  is  true  of  the  distributions.  The  limiting 
moments  belong  to  the  stochastic  variable  g  (x)  where  x  is  uniformly  dis- 
tributed in  (  —  71,71).  Hence  we  have  proved  that  the  eigenvalues  of  a  Toeplitz 
form  behave  asymptotically  like  the  ordinates  of  the  function  g  (x)  with  equi- 
distributed  x. 

The  case  of  circulants  is  considerably  easier  than  that  of  Toeplitz  forms. 
Consider  the  symmetric  circulant  matrix  A  of  section  3.4.  The  symmetry 
condition  implies  that  a2  =  an,  a3  =  an_x  and  so  on.  Hence  (see  section  3.4) 

h  n  =  a,  +  2a9  cos h  2  a,,  cos h  •  ■  •  . 

n  n 

This  series  has  a  finite  number  of  terms  and  the  last  term  depends  On 
whether  n  is  odd  or  even.  The  eigenvalues  are  clearly  the  equidistributed 
ordinates  of  the  partial  sums  corresponding  to  the  Fourier  series 


oo 


h  (x)  ~  ax  +  2  2  av  cos  v  x. 


2 


If    the    a„'s    had    been    defined    as  Fourier  coefficients  of  a  bounded 
real  function  g(x)  symmetric  about  zero 

1      " 
av= -—-    f  eHl'-1)xg{x)dx 


2tz 


—  n 


we  would  have  h(x)  =  g{x), 

the  analogue  of  the  above  theorem. 

C  in  the  present  section  is  a  Toeplitz  matrix  corresponding  to  g  (x)  =  cosjra: 
and  (3)  agrees  with  the  theorem  just  proved. 

One  should  note  that  ratios  of  quadratic  forms  Q±/Q2  can  be  treated  by 
considering  the  difference  Qx  —  u  Q2  as  in  section  3.4.  These  results  can  be 
used  to  get  approximations  to  the  moments  of  interest  and  they  can  be 
used  to  approximate  the  distribution  by  a  Pearson  curve  or  Gram-Charlier 
series.  Without  going  into  details  we  refer  to  the  papers  of  Koopmans,  Ru- 
bin, Dixon,  Whittle,  Leipnik  in  the  bibliography.  Proper  care  should  be  used 
in  applying  these  approximations  as  we  know  very  little  about  the  order  of 
the  error  except  in  some  special  cases  (see  Eklind-Jung  [1]). 

Dixon  [1]  considers  the  somewhat  more  general  case  of  testing  one  auto- 
regressive  scheme  against  another.  The  hypothesis  to  be  tested  is  that  the 
process  is  an  autoregressive  scheme  of  order  r  >  0  with  unspecified  coeffi- 
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cients.  The  alternative  hypothesis  is  chosen  as  an  autoregressive  scheme  of 
order  m>  r  still  without  specifying  the  coefficients.  As  such  a  scheme  of  the 
rth  order  is  Markoffian  of  the  rih  order  it  follows  that  the  residual  variance 
g\  (see  formula  (4))  decreases  at  first  but  is  constant  for  v  >  r.  Remembering 
the  expression  of  o\  as  the  ratio  between  two  covariance  determinants, 
it  would  be  intuitively  plausible  to  use  as  a  test  statistic 

jj  _  -Pm  +  l     Dr 
Dm     Dr  +  1 

where  Dv  denotes  the  empirical  covariance  determinant  of  order  v.  We  reject 
the  null  hypothesis  if  U  <  U0.  Dixon  uses  the  normal  approximation  to 
choose  U0.  Whittle  [1]  has  suggested  using  the  approximations  of  the  eigen- 
value distribution  just  described.  Dixon  uses  the  circular  statistics  while 
Whittle  uses  the  noncircular  ones.  Also  see  Hsu  [1]  who  suggests  using  the 
Cornish-Fisher  form  of  the  Charlier  series  to  get  asymptotic  expansions  for 
the  distributions  of  statistics  of  this  type. 

The  methods  described  in  this  and  the  last  section  have  been  elaborated 
in  various  directions  by  various  authors.  We  do  not  pursue  this  topic  any 
further  because  we  believe  that  the  discussion  above  gives  an  accurate  pic- 
ture of  the  ideas  upon  which  these  techniques  are  based.  An  interesting 
observation  of  Durbin  and  Watson  [1]  allows  one  to  use  modified  noncircu- 
lar statistics.  The  reader  is  also  referred  to  the  papers  of  Hannan  [1], 
Champernowne  [1]  and  Wise  [1]. 

3.6.  Testing  Autoregressive  Schemes  and  Moving  Averages 

In  this  section  we  will  study  how  to  test  the  hypotheses  that  the  process 
is  an  autoregressive  scheme  or  a  moving  average  of  some  specified  order. 
In  the  last  section  we  discussed  briefly  a  test  designed  for  the  first  case, 
and  one  of  the  two  tests  we  discuss  here  is  an  alternative  test  for  this  hypo- 
thesis. 

Consider  a  real  autoregressive  scheme  of  order  k 

xt+k    '    al  xt+k-l  +  "  "    '    &fc  %t  =  6t+k>  (1) 

where  the  e's  are  independent  with  the  same  distribution.  We  also  assume 
that  the  roots  of  the  characteristic  equation 

zk  +a1zk~1  H +ak  =  0 

are  all  of  modulus  less  than  one.  Then  we  know  that  xt  can  be  expressed  as 
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an  infinite  moving  average  of  the  e's  with  the  summation  extending  only 
over  past  values  of  the  time  parameter. 

Note  that  if  this  condition  concerning  the  modulus  of  the  roots  is  fulfilled, 
the  parameters  al3  a2,  •••  ak  can  be  estimated  consistently  (see  section  3.7). 

We  know  that  the  spectral  density  is 


f^=£ 


k 


2  av 


e 


ivX 


v  =  0 

so  that  the  correlation  coefficients 

.2 


-2 


,    CIq        1, 


^L   f 

TtiV   J 


Q^WZTJTt  z 


V 


1  dz 


2jziV    J  w(z)<p(z~  )    z 

where  (p  (z)  =  2  a»  2"> 

and    F    is    the    variance    of   the  process  xt.    This  means  that  qv  is  the 
vth    coefficient    of    the    Laurent    series    of    ol/Vq>{z)  (p  (z_1).     For    con- 
venience we  take  erf  =  1 . 
On  defining  the  quantity 


oo 


-*»>  ^     @y  Qv—j 


]  =  —  oo 


we  see  that  it  corresponds  in  the  same  way  to  the  function 

v-2i<p(z)<p(z-i)r\ 

Similarly,  we  define  Aj  by 

oo 


]  =-00 

oo 


Then  Bt=    J,    AjPt^ 


J  =  —  00 


corresponds    to  the  function  (p2 (z)/ V2  [cp {z)  cp  (z  1)]2  =  <p ~~2  (z  l)V  2,  which 
has  no  poles  for  |  z  \  >  1  so  that 

Bt  =  0,  t>0. 

oo 

Considering  Ct=    2    AiQt-j 

we  see  that  it  corresponds  to  cp2  (z)/  V  (p  (z)  <p(z~~ *)  =  <p  (z)/  Vcp(z~ 1)  so  that 

Ct  =  0,  t>k. 
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As  a  test  statistic  we  shall  use 


oo 


Rs  —      2*      AjQs_j, 


J  =  -00 


where    gt    is    the    5th    empirical    correlation    coefficient.    It    is    easy  to 
prove  that 


Eot~0s  \ 

cov  [Qf,  Qt]~l{Pt_s  +  Pt+s  +  2Qt  QsP0-2QsPt-2QtPs}\ 

TV  J 


as  n->oo   (see  Bartlett  [1],  p.  30).    Hence 

2k  2k 

ERS  =  2  EA/Qt-}-->2  AjQs_j  =  0,  8  =  k+\,k  +  2, 


(2) 


Now 


;=0  ;=0 


2fc 


COV  (gj,  Rt)  ~  -    2    ^;  {Pt-s-i  +  Pt-j+s  +  2QS  Qt-j  P0 


2fc 


%    ;  =  0 


if  s  =  t  and    =0  if  t  >  s.    Hence 


if  t>s  and 


2k 

cov  (Rs,  Rt)  =  2  ^;  cov  (Q*-h  Rt)  ~  0 


-1      2/c 


W   y=0 

if  t  =  s. 

This  result  is  due  to  Quenouille  [1]  who  suggests  the  following  application. 
If  xt  satisfies  (1)  then  the  quadratic  forms  Rk+1,  Rk+2  •■■  Rq  are  asymptoti- 
cally normally  distributed  (this  can  be  reduced  to  an  application  of  the  central 

i    2k 

limit   law)  and  independent  with  the  mean  zero  and  variance  -  TAjPj. 

W;  =  0 

Further  q*,  q*,  ...  gk   are  asymptotically  normal  with  means  gl9  q2,  ...  gk 
and  independent  of  Rk+1,  Rk+2,  •  ••  • 

Consider  first  the  case  when  the  coefficients  of  the  difference  equation 
are  specified.  Then  we  can  test  the  corresponding  hypothesis  by  using  e.g. 

108 


3.6 
%r =  -Bfc+i  +  Rk+2  +  •  •  •  +  -Rfc+z 

which  has  a  ^-distribution  asymptotically  with  /  degrees  of  freedom  under 
the  null  hypothesis.  As  a  critical  region  we  can  choose 

$  >  const. 

When  the  order  of  the  autoregressive  scheme  but  not  the  coefficients  are 
specified,  we  first  estimate  alt  a2,  ...  ak  using  the  statistics  g*,  q*,  ...  gt 
(see  section  3.7)  and  then  proceed  as  above.  The  estimation  does  not  disturb 
the  significance  level  of  the  test,  at  least  not  for  large  samples,  because  of 
the  asymptotic  independence  of  the  two  sets  of  statistics. 

The  simplest  case  is  k  =  1, 

xt+i  -qxt  =eu 

and  as  the  characteristic  equation  should  have  no  roots  outside  the  unit 
circle  |  g  |  <  1  and  we  have 

2k 

(l-gzf=  IAjz^ 

j  =  0 


A0  =  l 


so  that 


A±=  -2q, 
Rs  =  gi-2g  gt-i  +  g2  gf-2. 


A  =  Q2 


We  might  use  q*  as  an  estimate  of  g. 

Wold  [2]  has  developed  a  test  for  the  hypothesis  that  the  process  is  a 
moving  average  of  order  h  with  coefficients  1,  blf  b2,...  bh 

xt  =  et  +  b1et_1  +  '"bhet_hi 
where  et  is  pure  white  noise  as  before.  It  has  the  autocorrelation  coefficients 


Ql  =  K  ^  +  b*hl  +  &3  &2  +  '"bfl  bh~^ 
Q2  =  g[b2  +  b3b1  +  hb2+  '•'  +hbh-2] 


Qn 


bn 
K 


gv  =0,  v>h 
K  =  l  +  bl  +  b%  + 


+  b%. 


(3) 
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Consider  the  symmetric  matrix 

P  =  fe_fc;  i,  k  =  l,  2,  ...  n\. 

As    the    co variance    matrix    P    is    a  positive  definite  matrix,  it  can  be 
written  as  (see  Wintner  [1],  p.  28) 

P=YY*, 
where   Y  =  {yifc;  i,  k  =  1,  2,  ...  n}  is  triangular  and  nonsingular 

yu  >  o  \ 

2/i/c  =  0,  i<k.\ 

P    cannot    be    singular    as    it    is   the  covariance  matrix  of  a  process 
with  a  continuous  spectral  density  positive  almost  everywhere.    If 


0  =  Z*Pz  =      2     ZyZpQr-n^     J 


v,  /"  =  1 


2  ZvZ 

1 


2 


f{X)dX, 


n 

JvA 


then  f(X)  2  z„elv  =0  a.e.,  which  implies  that  z  =  0.    Then 
i 

Z  =  {zik;i,k=l,  2,...n}=Y~1 
exists  and  is  also  triangular.    Form  the  statistics 

-Kfi  +  i  =  Zi,\  Qh  +  1  +  2j,2(?/i+2+  "•  +  ^i,  i  £fo+i  ) 

*  =  1,  2  ... .  j 

For  large  samples  we  have 

Formula  (2)  gives  us 

cov  (gt+i,  Q*+k)~-  Pi-k 

1     n 
so  that  cov  (Bh+i,  Rh+k)  =  -    2   Zi,vZk,nPv-n> 

%  v,  fi  =  l 

But  the  sum  is  the  (i,  k)th  element  in  the  matrix 

ZPZ*  =  Y-1  Y  Y*  7*-1  =  I. 

To  test  the  hypothesis  in  question  we  form  Rh+1,  Rn+2>  •  •  •  ^n+f  and 

Z2f  =  n[Pli  +  Kl+2  +  -'-+Kl+fl 
110 


3.6-3.7 

Under  the  null  hypothesis  this  has  a  ^-distribution  with  /  degrees  of  freedom 
and  it  is  reasonable  to  use  a  critical  region  of  the  type  %2  >  const. 

The  reader  might  object  that  the  derivations  given  in  this  section  are  not 
quite  complete.  To  give  rigorous  derivations  would  carry  us  too  far  into  a 
direction  that  we  do  not  intend  to  follow,  but  we  believe  that  the  results 
are  correct  as  limit  theorems  asn->oo  when  /  is  kept  fixed. 

3.7.  Estimation  and  the  Asymptotic  Distribution  of  the 
Coefficients  of  an  Autoregressive  Scheme 

Let  xt  be  a  stationary  autoregressive  scheme  satisfying  the  difference  equa- 
tion 

xt  =  <x-iXt_1  +  a2#£-2  +  •  •  •  +  cf.pXf._p  +  a0  >  £f, 

where  the  stochastic  variables  s t  are  independent  and  identically  distributed 
with  mean  value  zero  and  the  roots  of  the  equation 

zp-ol1zp~1 OLp  =  0  (1) 

have  modulus  less  than  one.  Here  the  order  p  of  the  autoregressive  scheme  is 
assumed  known.  It  will  be  adequate  to  assume  that  the  first  four  moments 
of  the  st  exist. 

The  process  xt  is  not  assumed  to  be  normal.  It  is  not  unreasonable  to 
try  as  estimates  of  the  parameters  of  the  process  a0,  a1}  ...  olp,  a2  =  Ee\  the 
maximum  likelihood  estimates  of  these  parameters  derived  under  the 
assumption  of  normality  of  the  process,  even  when  the  process  is  not  normal. 
If  the  process  were  normal,  the  probability  density  of  x1}  ...  xN  would  be 
given  by 

1  1     N 

tv„\m„N  exP  ~  ^2  2  (xt-^xt-x ap  xt_p  -  a0)  ]. 

(ztt)      a  Za  t=i 

The  maximum  likelihood  estimates  maximize  this  expression.  On  differenti- 
ating the  logarithm  of  the  density  with  respect  to  the  parameters,  we  obtain 
the  following  system  of  equations  that  the  estimates  a0,  als  . . . ,  ap,  s2  of  a0, 
ax,  ...  a„,  g2  satisfy 

N  p  N 

2  xt  -  2  at  2  xt-i  -Na0  =  0 
t=i  i=i     t=i 

N  p  N  N 

_\xtxt_j-  2  «i  2  xt-iXt-j---a0  2  xt-j  =  0 

t=l  i=l         t=l  t=\ 


(?  =  1,...,P) 


Ill 
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1 


N 


2,  (xt  —  a^  Xt-i  —  •  ■  •  —dp  Xt-p  —  cLq)  . 


We  introduce  the  stochastic  variables 

i  N  i  N 

VN  t-i  VN  t=\ 


1      J 

yi=-j=  2  (^-a^-i 


l     N 

Vn  t-i 


Note  that 


V         AT  N 

yt  =  2  V^(oy-ay)  2  xt-iXt-j/N+  VN{a0-<x0)  2  ^-i/^ 
y=i  «  =  i  t=i 

(»  =  1,  ...,p) 

p  W 

2/o  =   2  ^  («;  -  ay)  2  xt-,/N  +  l/tf  (a0  -  a0). 

Assume  that  the  variance  of  a?<  is  one.    Let  Ext  =  c.    Clearly 

1     N 

E-^xt^c 

1     " 


Moreover 


AT 


AT 


Af-1 


JBlFla*--e)-ii,.|ie«--^.X(i-^je. 


and 


AT 


(2) 


■#   I  ^    2    Xt-iXt-j-Qi-j 


c       ~ 


N 


2*     \Qi-j   '    Qy-i+j  Qv+i-j)  ""•"  e  (?i-/| 


(see  Bartlett  [1],  where  e  =  E  sj/o*  —  3.    On  the  other  hand 

ks|<^i?,s| 

where    K    is    a    constant    and  0<R<1  since  all  the  roots  of  (1)  have 
modulus  less  than  one.    Thus 

N 


(i    N  V 

"(nI*-'-0)-*0 

/i   N  \2 

^1^2  Xt-tXi-j-Qi-f-C*)    ->0 
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and 


i        N  l         N 


are  consistent  estimates  of  c  and  Qi-j  +  c2  respectively.    Set 

1     N 

—  2  xt-i  =  DioN  =  DoiN,     i=^0 

jy  t=i 
1   N 

—  2  xt-iXt-i  =  DijN,     i,j*0 

1  =  ^OOAT- 

The  system  of  equations  (2)  can  then  be  rewritten  in  the  form 


yt  =  2  VN  (a,  -  a,-)  A,w . 


(3) 


As  we  have  seen 


DijH->Dij  —  < 


'Qi-j  +  c2       i,j*0 
1  i=j=0 

c  otherwise, 


in    probability    as    N->oo.    The    limiting  determinant  in  probability  of 
the  system  of  equations  is  after  reduction 


Ay  | 


1  0  0         ...0 

0  £>0  0!  ...   Qp 

o  p_i  p0      ...Qp-i 

0  £_p  Q-p  +  i...Qo 


*o. 


On  the  other  hand,  the  stochastic  variables  yt,i  =  0,1,  ...,  p,  are  asymptoti- 
cally normally  distributed  with  mean  zero  (since  the  y^s  have  mean  zero) 
and  covariance  matrix  (see  Diananda  [1]) 


{Eyty,}  =  oi 


\      c  c  ... 

c       Qo  +  c2        Qi  +  c2  . . . 

C        g_i  +  C2       Q0  +  C2  ... 


Solving   equations    (3)   for  VNfa  —  otj)  in  terms  of  the  i/j's  we  see  that 

the  stochastic  variables  KiV  (at  —  a*)  are  asymptotically  normally  distrib- 
uted with  mean  zero  and  covariance  matrix 
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-i 


Qo  +  C2  Q1  +  C2   ... 

Q-!  +  C2       Q0  +  C2   ... 


The    estimates    at  are  clearly  consistent  estimates  of  the  parameters  a* 
and  therefore 

1     N 
s2  =  —  2  (%t  -  «!  xt-i ap  xt-v  -  a0)2 

is  a  consistent  estimate  of  o2.  Mann  and  Wald  prove  this  without  assuming 
stationarity.  However,  the  proof  in  their  somewhat  more  general  context  is 
the  same  as  that  given  above.  A  nonstationary  process  of  the  autoregressive 
type  satisfying  the  other  assumptions  given  above  approaches  the  stationary 
case  as  £->oo.  But  the  theorem  proved  above  is  an  asymptotic  theorem  and 
so  one  would  not  expect  anything  different  to  arise  in  the  nonstationary  case. 
In  our  discussion  of  the  autoregressive  scheme,  we  assumed  that  the  roots 
of  the  equation  (1)  have  modulus  less  than  one.  This  assumption  is  necessary 
because  without  such  a  restriction  the  parameters  a*,  g2  are  in  general  not 
uniquely  determined.  When  the  parameters  in  a  statistical  context  are  not 
uniquely  determined  by  the  probability  distributions  and  hence  clearly  not 
capable  of  being  estimated,  the  parameters  are  called  nonidentifiable.  We 
consider  a  simple  autoregressive  scheme  to  illustrate  the  fact  that  one  has 
nonidentifiability  of  the  parameters  a,,  a2  without  the  restriction  that  the 
roots  of  (1)  have  modulus  less  than  one.  Let  xt  be  a  normal  stationary 
process  with  mean  zero  and  spectral  density  (1/2  jr)  (1/ 1  elA  —q\2),  where  q 
is  a  real  number  with  modulus  less  than  one.  Then  xt  satisfies  the  difference 
equation 

xt  ~  Qxt-i  =  st 

where  the  et'a  are  normal,  independent  stochastic  variables  with  mean  zero 
and  variance  one  and  also  the  difference  equation 

1 

Xt Xt-!=  Tjt, 

Q 

where  the  r)t'&  are  normal,  independent  with  mean  zero  and  variance  1/q2. 
We  note  that  Mann  and  Wald  have  also  discussed  vector  processes  (a?lt,  ... 
xrt)  satisfying  a  system  of  linear  difference  equations 

r         p 

2    2  <x.ijkXjtt-k  +  <Xj  =  eit     (*  =  1,  ...,r) 

;=1    fc=0 
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with  restrictions  on  the  vectors  (elt,  . . . ,  ert)  and  the  parameters  of  the  system 
paralleling  those  imposed  on  the  difference  equation  in  the  1 -dimensional 
case.  Under  the  assumption  that  {ai;0}  is  the  unit  matrix,  they  obtain 
analogous  results  on  estimation  of  the  parameters  of  the  system.  The  as- 
sumption that  {ocj;0}  is  the  unit  matrix  is  natural  and  essential.  Otherwise 
the  parameters  of  the  system  are  not  uniquely  determined,  that  is  not 
identifiable . 

T.  W.  Anderson  has  looked  into  the  power  aspects  of  certain  tests  men- 
tioned in  this  chapter  in  one  of  his  papers  [1].  The  interested  reader  will 
benefit  by  studying  this  paper  and,  further,  R.  L.  Anderson  and  T.  W. 
Anderson  [1]. 


3.8.  Discussion  of  the  Methods  described  in  this  Chapter 

The  exposition  given  in  this  section  is  not  complete  in  the  sense  of  covering 
all  the  work  that  has  been  done  on  finite  parameter  models.  References  to 
other  work  can  be  found  in  Kendall  [1].  However  it  gives  a  fairly  representa- 
tive picture  of  the  work  of  the  past  few  decades  and  the  main  ideas  advanced 
to  solve  problems  in  this  field. 

All  these  methods  deal  with  models  that  are  of  the  finite  parameter  form: 
a  finite  moving  average  or  an  autoregressive  process  perhaps  with  a  trigono- 
metric regression  component  superimposed.  This  is  certainly  sensible  since 
a  wide  class  of  stationary  processes  can  be  approximated  by  these  schemes 
as  closely  as  is  required  by  choosing  a  model  of  sufficiently  high  order. 

Of  course  in  practice  one  very  seldom  has  a  priori  information  concerning 
the  order  of  the  scheme  or  even  that  it  is  a  finite  parameter  scheme  (except 
perhaps  in  a  few  special  contexts).  It  is  therefore  not  a  very  realistic  pro- 
cedure to  test  various  assumptions  of  this  type  against  each  other,  e.g. 
an  autoregressive  scheme  of  order  two  against  one  of  order  three.  An  ex- 
ception should  be  made  for  schemes  of  order  zero,  i.e.  pure  white  noise,  as 
these  have  a  more  immediate  interpretation  in  practical  terms. 

One  difficulty  in  many  of  the  applications  of  time  series  analysis  is  that 
there  is  very  little  theory  built  up  from  experience  so  that  one  is  not  led  to 
very  well  specified  schemes.  In  such  fields  it  seems  more  promising  to  use 
empirical  data  to  form  confidence  regions  for  the  models  than  to  test  sharply 
defined  models  whose  validity  is  questionable  to  say  the  least.  These  testing 
problems  may  have  some  theoretical  interest,  but  they  are  seldom  relevant 
to  problems  arising  in  practice. 

In  engineering  and  in  the  physical  sciences  there  has  been  a  strong  de- 
mand for  realistic  methods  to  analyse  stationary  time  series.  We  believe 
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that  the  approach  taken  by  the  research  workers  in  these  fields  is  more 
promising  and  in  closer  contact  with  reality  than  some  of  the  earlier  tech- 
niques developed  by  theoreticians.  This  belief  is  strengthened  by  the  success 
of  the  methods  in  various  applied  fields;  some  of  these  methods  will  be  dis- 
cussed later  on  in  this  book.  A  discussion  of  related  topics  can  be  found 
in  Grenander  [5]. 

This  approach  consists  in  not  specifying  the  model  very  much,  and  in- 
stead of  dealing  with  a  finite  number  of  parameters  one  considers  the  spectral 
density  or  some  similar  nonparametric  concept.  Although  we  will  develop 
some  tests  of  hypotheses,  our  main  concern  will  be  to  form  confidence  re- 
gions. Many  statistical  problems  arise  and  we  will  discuss  some  of  them  in 
the  following  chapters.  Very  much  remains  to  be  done,  especially  in  extend- 
ing the  scope  of  relevant  techniques  to  wider  classes  of  stochastic  processes. 
However,  we  believe  that  the  results  obtained  indicate  that  a  statistical 
theory  of  stationary  time  series  of  great  practical  value  can  be  constructed. 
The  success  of  such  a  nonparametric  approach  in  certain  fields  does  not 
guarantee  its  success  in  other  fields,  e.g.  in  the  social  sciences  and  economics. 
This  can  be  judged  only  after  application  of  these  techniques  to  relevant 
data  arising  in  these  fields,  but  one  suspects  that  the  stationary  assumption 
may  break  down  in  such  contexts. 
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CHAPTER  4 


ESTIMATION  OF  THE  SPECTRUM 


4.0.  Introduction 

We  now  proceed  to  study  one  of  the  nonparametric  problems  mentioned 
in  section  3.9,  specifically  the  estimation  of  the  spectral  density  when  the 
class  of  admissible  spectra  is  restricted  only  by  mild  regularity  assumptions 
to  be  introduced  later. 

At  first  we  consider  a  very  wide  family  of  estimates  and  derive  some  of 
their  properties.  This  leads  us  to  narrow  down  the  family  considered, 
especially  when  we  see  that  the  so-called  spectrograph  estimates  (see  section 
4.2)  are  optimal  in  a  certain  sense. 

After  studying  the  spectrograph  estimates  and  a  closely  related  family, 
we  examine  some  specific  estimates  in  section  4.6.  Other  examples  are  to 
be  found  in  the  next  chapter.  Using  the  mean  square  error  as  an  optimality 
criterion  we  try  to  choose  good  weight  functions  for  these  estimates  in  sec- 
tion 4.7.  As  only  some  of  the  simpler  cases  are  considered,  we  cannot  offer 
any  final  and  unique  solution  of  this  problem;  instead  one  has  to  make  the 
choice  of  a  good  estimate  from  case  to  case.  A  good  deal  of  work  remains 
before  this  problem  can  be  settled  adequately. 

4.1.  A  General  Class  of  Estimates 

Let  xlf  x2,  ...,  xN  be  a  sample  time  series  drawn  from  the  real  stationary 
stochastic  process  {xn},  Exn  =  0,  Ex\  <oo.  We  want  to  estimate  the  spec- 
trum of  the  stochastic  process  {xn}  from  the  time  series  xlt  x2,  ...,  xN.  The 
discussion  of  section  1.7  indicates  that  if  the  spectrum  has  jumps,  one  cannot 
in  general  expect  to  estimate  it  consistently.  Since  the  singular  part  of  the 
spectrum  is  of  little  practical  interest,  we  restrict  ourselves  to  an  absolutely 
continuous  spectrum  with  a  continuous  spectral  density  f(X).  The  fact  that 
the  stochastic  process  is  real  implies  that  f{X)  is  symmetrical  about  zero. 
The  problem  of  estimating  the  spectrum  can  be  approached  in  several  differ- 
ent ways.  One  might  be  interested  in  estimating  the  spectral  mass  F{X2)  — 
Fftj)  in  a  fixed  interval  (A1}  A2)  or  perhaps  in  estimating  the  spectral  density 
at  the  point  X.  It  is  clear  that  for  this  purpose  it  is  necessary  to  use  some 
nonlinear  device,  and  as  the  most  tractable  analytically  are  the  quadratic 
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ones,  we  restrict  ourselves  to  estimates  which  are  quadratic  forms  in  the  obser- 
vations. 

Let  the  quadratic  form  be 

1        N 

znis  v<fJ,=\ 

Since  the  observations  #„  are  real  and  we  are  trying  to  estimate  a  real  quan- 
tity, it  seems  reasonable  to  assume  that  &JJP  is  real.  In  fact,  since  tx.%  is  real, 
the  &JJP  can  always  be  chosen  real  since  the  x's  are  real.  We  can  always  find 
a  function   WN(x,    y)  defined  in  the  square    —  ji<x,   y  <  n  and  of  bounded 

n 

variation,  that  is   J  J   \dWN(x,y)\  <cx),  such  that 


bijP  =  ffJ"-im'dWN(x,y)]     v,  ju  =  1,2,...,  N.  (2) 

—  n 

The  condition  b(v1^)  =  b^)  will  be  satisfied  if  dWN  (  —  y,  —x)  =  dWN(x,y). 
Note  that  the  weight  function  W  N  (x,  y)  is  not  uniquely  determined  by 
KJ?»  v,  fJL  —  1,  2,  ... ,  N.  The  Fourier- Stieltjes  coefficients  of  WN(x,  y) 
with  v,  ju  not  both  in  the  range  1  to  N  are  at  our  disposal.    If  these 

x       y 

coefficients  are  set  equal  to  zero,  WN  (x,  y)  =  J  J  wN(u,  v)dudv  is 
absolutely  continuous  with 

N 


—  n  —n 


™N(x,y)  =  -±-2    2   6£JPe"*+l". 

However,  this  will  not  always  be  the  convenient  choice  for  us.  As  the  spectral 
mass  is  a  nonnegative  quantity,  it  may  seem  reasonable  to  require  that  ol% 
be  a  nonnegative  form.  One  should  then  choose  a  nonnegative  definite  weight 
function  WN(x,  y),  that  is  a  weight  function  such  that 

n 

j  p{x)p{y)dWN{x,y)>0 

—  71 

for  any  continuous  function  p{x).  Note  that  dWN(x,x)>  0  if  WN(x,y)  is 
nonnegative  definite.  Nonetheless  we  will  later  discuss  a  few  estimates  whose 
weight  functions  are  not  nonnegative  definite. 

For  the  moment  consider  only  the  expected  value  of  a*  for  large  values  of  N 
when  the  weight  function  WN(x,y)  =  W(x,y)  is  fixed.  Suppose  that  f(X)  has 
an  absolutely  convergent  Fourier  series  so  that 
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v  =  0 


oo. 


4.1 

(3) 


Then 
where 


E<A-±     "l     <f/'   M 


2  71  p.. 


■N+l 


N 


<°  = 


For  fixed  p>0  we  have 

(N)  =         l  f 

p      N-pl 


1       N 

i-V         jJ  v  =  l  +  p 
-i  Af+p 

2  Kv-p,    p<o. 


>N  +  p  „=i 


.i(N-p)(x-y) 


vx-'^-t   —     I  c^ci(1+p)(x-1') 


1-e1 
X_e«(x-y)    dW(x>y)> 


Introducing  the  function 


A.  (a*) 


(1,      U|<£ 

10,      |/*|>e,  £>0, 


we  have 


wpN)  -  lim  jf  eipy  Ae  (a  -  y)  d  W(x,  y) 


e-^0 


<  lim  // 


e-^0 


I         I  —  eHN-p)(x-y) 


N-p 


dW(x,y) 


Let  e  first  tend  to  zero.    Since  the  integrand  is  bounded  and  tends  to 
zero  for  all  x,  y  as  iV->oo  we  have 


where 


lim  wpN)=   j  eipydm(y), 
m(y)=  lim    J  J   Ae  (u  —  v)  d  W(u,  v). 


e-*0 


u,v^y 


If  one  were  to  use  the  usual  differential  formalism,  one  would  say  that 

dm{y)  =  dW(y,y). 

The    corresponding   limiting  relation  holds  for  p<0,    Since  the  weights 
w(pN)  are  bounded  in  absolute  value  by 
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and  (3)  holds, 


J   \d  W  (x,  y)  |  <  oo 


lim  Eol*n  =  -\  f{X)dm{X). 


Clearly  dm(X)  =  0  if  W (x,y)  is  absolutely  continuous.  If  one  were  only  inter- 
ested in  the  expected  value  of  the  estimate  for  large  samples,  it  would  not 
matter  how  dW (x,y)  is  chosen  outside  of  the  main  diagonal.  Of  course  one 
wants  to  reduce  the  variance  as  well  as  the  bias  of  the  estimate.  There  are 
many  possibly  good  estimates.  When  estimating  the  spectral  mass  in  a  fixed 
interval  (Xl9  X2),  0<  Xx  <X2  <tz,  W{x,y)  could  be  chosen  so  as  to  have  dW(x,y) 
=  0  when  x=t  y,  dm(X)  =  J  when  Xx  < \X\  <  X2,  and  dm(X)  =  0  outside  Xx  < 
\X\  <X2.  In  the  important  case  of  estimating  the  spectral  density  at  X  the 
weight  function  must  vary  with  N  and  it  seems  plausible  that  the  weight 
function  should  in  some  way  become  more  and  more  concentrated  about 
the  points  (X,  X)  and  (  —  X,  —  X)  as  N-^oo.  The  reason  for  the  symmetry  in 
the  choice  of  the  weight  function  is  that  f(X)  =/(  —  X)  and  dWN(x,  y)  = 
dWN(-y,-  x). 

Now  consider  the  variance  of  the  estimate  a*  when  the  process  sampled  is  a 
normal  process.  Since 

1         N 

N  O  AT         ^-i  v(l         v       /* 

Ln  JS  v,fj,^i 
it  is  clear  that 

=  2  Wpq  +  Ppq]  rP  rg  =  S1  +  S2 

(see  the  discussion  in  section  4.2  on  the  moments  of  linear  processes).  We 
assume  that  WN(x,  y)  is  absolutely  continuous.  It  will  be  enough  to  consider 
only  S1  since 


$1  — #2=     J 


N 


y     6(Ar)  p-ivu  +  ifiv 

V,  1-1=1 


2 


/  (u)  f  (v)dudv. 


This  follows  from  the  fact  that  &<  J?  =  &£!?. 


Note  that  if  we  set 


(^7T)     VtjM=J 
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O        2        » 

then  D2  [a* ]  =  -^  // 1  wN  (u,  v)  |2  /  (u)  f{v)dudv.  (4) 


tf1 


-?r 


However  wN(x,y)  may  be  some  other  weight  function  corresponding  to  the 
coefficients  bJJP,  v,  jbi  =  l,  ...,  N  and  we  would  like  to  know  under  what 
conditions  relation  (4)  still  holds  asymptotically  as  N->oo. 

Now  8X  is  a  monotone  functional  of  the  spectral  density.  Letf(X)  be  positive 
and  continuous.  Given  any  e  >  0,  we  can  find  two  positive  trigonometric 
polynomials 


PiW  =    I   yueij\     1=1,2, 


such  that  p1{h)  <  f{X)  <  p2W  and  |  p1(h)  —  p2W  |  <  £-  The  inequality 

s1(p1wy<s1(t(X))<s1(p1W) 

holds.  The  summation  in  the  expression 

#i  (P  W)  =  2  «p«  rv  rQ 


P,  Q 


is  over  a  finite  range  which  does  not  vary  with  N  if  p  (X)  is  a  trigonometric 
polynomial.  Note  that 


N 


xpq=  2  Offi.*+«   a  p,q>o 

v  =  p  +  l 

while  analogous  expressions  hold  when  p,q  have  different  sign.  We  shall  ob- 
tain the  desired  asymptotic  relation  under  appropriate  conditions  for  /  (A)  = 
p{X)  a  positive  trigonometric  polynomial.  Since  the  summation  in  S1{p{X)) 
is  over  a  fixed  and  finite  range,  one  need  only  discuss  what  happens  to  a 
typical  coefficient  olpq.  Let  the  coefficients  V$  be  the  Fourier  coefficients 
of  wN(x,  y).  Here  the  coefficients  6*JP  are  not  necessarily  zero  if  v,  fi  are  not 
both  in  the  range  1  to  N.  We  shall  show  that  if 

N-s 

I  ICT 

"■"-s ^1  (5) 


oo 


Z,       I  uvfi   | 

V,  pi=  -00 

as  N^oo  for  every  fixed  s>0,  then 


^2["*]~^T  J7K(s,  y)\2f(x)f(y)dxdy  (6) 


—  71 
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as  N-^oo.    Now 

[max /(A)]2    1    |&<«|2>-^-2>[min/'(A)]2    f   |C|2- 
It  will  be  enough  to  show  that  olpq  is  asymptotically  the  same  as 

00 

A      =       V       h(N)h{N) 

v,  [l=-0O 

But 


oo  /       oo  iV- max  (|p  |,  I  c|)  \ 

,    ,      ,     Z    I  CI2      2    -         2  I  CI2 

<Xpg  J±pQ\  I      v>n=-oQ \y,  fX=-QQ  y,  //  =  max  (|  V  I ,   I  g  1)7  pv 


2  I  CI2    '  (  2  I  CI2)' 

v. /•*  =  !  \v,  fx  =  l  J 


as    iV-^°o    by    the    Schwarz    inequality.     Since  there  are  only  a  finite 
number  of  coefficients  olvq  it  follows  that 

Si  {p  W)  =  Z*PqrprQ~2\      2     &SJP  W+k  ^+<z )  ^  ^« 
=  (2nf  jj\wN(x,  y)\2p(x)p(y)dxdy. 

—  n 

On  letting  the  maximal  difference  e  between  the  two  trigonometric  poly- 
nomials Pi(X)  approximating  f{X)  tend  to  zero,  we  see  that 

g      2      n 

D2  [a*  ]  - -r=-  jj  \wN {x,  y)  |2  f(x)f{y)dxdy 


N* 


—  71 


as  2V->oo.  A  condition  similar  to  (5)  is  assumed  in  section  4.5  to  establish 
the  validity  of  the  asymptotic  relation  (6). 

In  the  case  of  a  fixed  weight  function  wN(x,  y)  =w(x,  y)  the  variance  goes 
down  to  zero  as  iV~2.  However,  as  the  expected  value  of  ol*n  tends  to  zero, 
such  an  estimate  would  not  be  of  any  use  in  estimating  the  spectrum.  In 
particular,  in  estimating  the  spectral  density  f{X)  one  could  not  use  a 
fixed  weight  function  w(x,  y)  =wN(x,  y).  A  reasonable  weight  function 
wn(x>V)  would  concentrate  more  and  more  mass  in  the  neighborhood  of 
the  points  (A,  X)  and  ( —  A,  —  X)  as  iV-^oo  in  order  that  the  estimate  a*  be 
asymptotically  unbiased.  On  the  other  hand  wN  (x,  y)  must  not  concentrate 
the  mass  too  rapidly  since  one  could  then  not  have  D2[a*]-»0  as  iV-^oo. 
This  will  be  made  more  explicit  later  on  in  the  discussion  of  the  mean  square 
error  of  various  specific  estimates  in  section  4.6, 
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//  a*  is  an  asymptotically  unbiased  estimate  of  the  spectral  density  f{X), 
result  (6)  concerning  the  asymptotic  variance  of  the  estimate  can  be  shown  to 
hold  in  the  more  general  context  of  a  linear  scheme  by  using  the  method  of 
section  4.2, 

We  discuss  the  problems  of  this  chapter  in  the  context  of  linear  schemes 
for  convenience  and  continuity.  It  should,  however,  by  now  be  clear  that 
one  really  only  needs  assumptions  on  moments  up  to  the  4th  order  (see 
Parzen  [1]). 


4.2.  An  Optimum  Property  of  Spectrograph  estimates 

Probably  the  most  important  estimates  of  the  spectral  density  are  of  the  form 

fl(X)  =  -±-    2    xvx,\  el^^wN(y)dy,  (1) 

that  is,  the  weight  function  wN(x,  y)  of  section  4.1  has  all  its  variation  on  the 
main  diagonal  x  =  y.  We  shall  call  such  an  estimate  a  spectrograph  estimate. 
On  interchanging  the  order  of  summation  and  integration  in  (1),  it  can  be 
rewritten  as 


/*  (A)=  J  IN(y)wN(y)dy, 


where  IN{y)  = 


1 


N 


7 ,  xv  & 

v  =  l 


2nN 
introduced  in  section  3.1.    We  can  also  write 


is   the    so-called    periodogram  that  was 


N 


'N        O  ~  AT       2,     Xv  XH  ^v-fJ. 


2    \>{N) 


2 7i  v=_n+i2V 


v       J 


where  the  weights 


wiN>=  j  eivywN{y)dy 


and  Cv  is  the  product  sum  of  lag  v 


N-\v\ 

Gv—    2,   xt  xt + 1  v  i  • 
t=i 
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The    characteristic    property    of    a   spectrograph   estimate   is   that  the  coef- 
ficients #$  depend  only  upon  the  difference  v  —  /j, 

UN)=      (AT) 

or  equivalently  the  spectrograph  estimate  is  an  average  of  the  periodogram. 
The  assumption  that  b(v%)  =  b)%)  implies  that  w^)  =  w(Lp. 

One  might  suspect  that  because  of  the  stationary  nature  of  the  observed 
process  it  would  be  advantageous  to  choose  the  bffl  in  the  way  mentioned 
in  the  last  paragraph.  Such  is  the  case,  at  least  for  large  samples.  To  prove 
this  a  result  on  certain  moments  of  a  linear  process  will  be  needed. 

A  linear  process  can  be  written  in  the  form 

oo  oo 

v=  —oo  v=  —  oo 

where  £t  is  pure  white  noise  with 

E£t  =  0) 
Eg  =  l)' 

Assuming   that   the  fourth  moment  of  £t  exists,  we  introduce  the  quantity 

e  =  E£$-3 

which  is  the  fourth  cumulant  of  £„.    Then 

JJJ     Xy_    Xft    Xy    Xfi 

oo 
=       Z,       aOL-vx  ap-v2  Uy-v  ^d-Vt  &  ?»»!  Qrz  Qv3  Qvt 


OO 

—  e     2^    ^a  -v  aR—v  O'y—v  "d—v  i 


'  2*        ^a-v  Qfi-v  tty-fj,  ^d-/i  ~r         Z,        ®OL-v  ®{}-(i  My^p  &d-v  ~T" 


V,fl=-O0  V,[l=—CO 


oo 

I  2_       ^a  -  v  Mfi  -  /«  My  _ ,,  a$  _  p  . 

v,  H=  -oo 


Since  the  covariance  of  a  linear  process  is  given  by 

00 

ra-p  ~     Z,    aa-v  ap-v  i 
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v  =  —  oo 


4.2 

it  follows  that 

oo 

cov  (xaxp,  xYxd)  =  e  2    Qx„vap-vay_vad_v  +  r<x-.drp-Y  +  ra-Yrp-d. 

v=  -  00 

Now  consider  a  general  estimate 


1         °° 

™  N  ~  '    ^%  -\T  Z-i         ™VfJl      Xy  X  ^ 


2nN  ,tJli 

n 

with  bvf  =  jf  eivX-if*ywN(x,y)dxdy, 


—  n 

N 


ie?*^    nonnegative    definite    weight    functions    wN  (x,  y)    and    such   that   the 
integrals 

n 

j  j  \wN  (x,y)\dxdy<K  <oo  (2) 

—  71 

are  uniformly  bounded.    Introduce 

p       N-\v\     Z       M  ' 

where  the  summation  with  respect  to  v  is  extended  over  the  range  1  +  p, 
2  +  p,  ...,  N  if  p > 0  and  over  1,  2,  ...,  N  +  p  if  p  <  0.    TAe  estimate 

i  AT-l 

Z7lJ\   p  =  -N+i 

is  a  spectrograph  estimate.    Now  /*   fois  ^e  same  mean  value  as  a*  since 


2tzN 


2    C^  =  ^aJ 


so  that  with  respect  to  bias  they  are  equally  good  (or  bad). 

We  assume  that  a*   is  an  asymptotically  unbiased  estimate  of  the  spectral 
density  f  {X).    It  then  follows  that 
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n        n 


A7)  c-i{v-n)y 


f(y)dy 


/  WN{y,y)f{y)dy^f{X) 


N 


as  JV->  oo  for  any  continuous  nonnegative  function  /  (y)  symmetric  about 
zero.  But  wN  (x,  y)  is  nonnegative  definite  so  that  wN  (y,  y)  >  0.  Thus 
for  any  fixed  e  >  0 


2tz 


2jz 


R7       J       wN{y,y)dy-*0,  —       J       wN{y,y)dy-*\ 

iV    |l/±A|>e  ^V     |3/±A|<e 

as  iV->oo.    On  using  the  Schwarz  inequality  one  can  see  that 

c  4^2 

l-£<2e  J  w%{y,y)dy-^ 


as  N-*-oo.    It  then  follows  that 


/O  ^.\3      »  I  N-l 

-yi-  )™»(y,y)dy=w     2 


p=-N+l 


I 


UN) 


v-y.  =  V 


CO 


as  N^oo.    Now  consider  the  variance  of  a*.    It  is  given  by 
4  n2  N2  D2  [a  J] 


But 


where 


AT 


OO 


"T  o      ^     dVi  —  v  (%Vi^v  Q/fii^v  dfj,  —v  |. 


JV 


oo 


V        7>(tf)    !,<#>         V      „  „  „  „  =        V       /7? 


v=  —  oo 


n  = 


rc      AT 


A7 


/J  2  ak_veikx  2  «y-vC  iiy  wN(x,  y)  dx  dy 

-n  *-l  /"I 

/    * 

\fc=i 


(3) 
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Under  the  assumption  that  ak  =  0(l/\k\2)  as  |&|->oo   we  see  that 


-l  v 


0(1) 


0 


1 


Xv-Ny 


00 


if  0<v<N 
if  v>N. 


Thus 


as  iV-^oo.    But  now 


2    T,  =  0(N) 


v=  —  oo 


AT 


(iV)|2 


«i  =  I  CI.  Ci,  >V„  r*-*  >  4  *»  mill  f  (A)    2    1 «*  | 

A  v,  ju=l 


1 


tf-1 


>47C2minfa)iV--2      2 


2  C 


v-M=P 


using  the  Schwarz  inequality.  The  asymptotic  unbiasedness  of  the 
estimate  implies  that 

lim  —  =  oo. 

iV->oo  iV 

Thus  for  an  asymptotically  unbiased  estimate  of  the  spectral  density,  the 
variance  component  arising  from  the  third  term  in  (3)  is  negligible  compared 
to  the  first  two  terms  as  iV->oo.  The  same  argument  shows  that  this  is  also 
true  of  the  corresponding  spectrograph  estimate. 


Now  set 
Note  that 

Then 

N 


rf(N)=UN)_       <N) 
Wvu         uvn         Wv—  i 


"Vf* 


v—fi' 


2 


C=o. 


v-n  =  p 


w{N)    w(N)    +d(N)  tJ(N)  +w<-N)    ^N)  +  /7(A°  w(N) 


v»-l*z 


rvl-/itriil-/*s' 


(4) 


The  first  term  corresponds  to  the  variance  of  the  spectrograph  estimate. 
The  second  term  is  nonnegative  and  is  large  when  the  deviation  of  <x*N  from 
a  spectrograph  estimate  is  large. 

Let  us  now   consider  the  third  term.  Sum  w^ltli  di^t  rri_„2  rft-A  over 
the  range  vlf  v2,  [xly  /u2  =  1,  2,  ...,  N.  Consider  first  setting 
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where  let  us  say  p,  q>  0.  Then  let  us  set  v2—  jli2  =  s  where  s  >  0  and  sum  the 
terms  under  these  restraints. 
We  then  obtain 

N-  max  (s  +  p.  q) 
v=l 

Now  the  coefficients  tt$,  w(pN),  di%\  are  uniformly  bounded  in  absolute 
value   by   some  finite  constant  K  since  (2)  holds.    Using  the  fact  that 

^    <fiJP  =  0,  it  follows  that  (5)  is  bounded  in  absolute  value  , by 

K2\rprg\ma,x{\p\,  \q\). 

Arguing  in  a  completely  analogous  manner  one  can  show  that  the  sum  under 
the  restraint  of  fixed  p,  q,  s,  whatever  sign  they  may  have,  is  bounded  in 
absolute  value  by 

K2 1 rprq |  max (| p |,  \q\). 
Now  if  afc  =  0(l/|A;|2+<5),  6  >  0,  then  rp  =  0{l/\p  |2+<5)  since 

\rP\  =  \^akak+p\ 


<  - 


\lc\2+8  +  l  \Jc  +  p\2+d+l 


2  +  d 


t>t\k^»+l 


so  that  on  summing  over  p,  q  the  bound 

K% 2 \rp I  \rp I max  (\p\>  k I) < °° 

is  obtained.  Now  on  summing  over  s,  the  third  term  is  seen  to  be  0(N). 
The  fourth  term  can  be  estimated  in  exactly  the  same  manner.  Since  the 
first  term  of  (4)  diverges  faster  than  N  as  N^oo  and  the  second  is  positive, 
it  follows  that  nothing  is  gained  when  the  sample  is  large  by  going  outside 
the  class  of  spectrograph  estimates  when  considering  asymptotically  un- 
biased estimates  of  the  spectral  density.  Clearly  our  criterion  for  a  good 
estimate  in  this  discussion  has  been  a  small  mean  square  error. 

We  shall  now  formally  restate  what  has  been  proved.  Let  the  observed 
process  xt  be  any  linear  process 
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oo 


v=  —  oo 

^£,  =  0   ' 


(6) 


1 


a*==0|rri2+a)»    ^>0' 


wttA  a  positive  spectral  density.    Let  the  estimate 

1  * 

«*  —  V     *(#)  r    r 

aN       o       tvt      ^-*       ^   ^  "^A* 

6e  cm  asymptotically  unbiased  estimate  of  the  spectral  density  of  the  observed 
process  at  the  point  X,  that  is,  an  estimate  of  f{X)  with  a  nonnegative  definite 
weight  function 

^N(xiy)  =  -]—2    2   C«"iw+iw»   fj\wN(x,y)\dxdy<K. 

(Zti)  ViiU=i  J_Jn 

There  is  then  a  spectrograph  estimate  with  the  same  bias  as  ol*n  and  a  mean 
square  error  that  is  asymptotically  not  larger  than  that  of  the  estimate  ol*n. 

In  spite  of  this,  there  are  estimates  that  are  not  spectrograph  estimates 
but  that  are  of  considerable  interest.  We  shall  discuss  such  a  class  of  estimates 
in  section  4.5.  This  class  is  a  natural  one  to  use  if  one  wishes  to  build  an 
analogue  computer  to  estimate  the  spectral  density. 

Note  that  an  interesting  result  has  been  obtained  as  a  by-product  in  the 
proof  of  this  section.  Under  the  conditions  specified  above,  the  variance  of  an 
asymptotically  unbiased  estimator  of  the  spectral  density  tends  to  zero  slower  than 
iV_1  as  N-^oo,  that  is 

N-\=o(D*\fN(X)]). 

4.3.  A  Remark  on  the  Bias  of  Spectrograph  Estimates 

Consider  a  spectrograph  estimate 

i  N  n 

/*  (A)  =  - — —    2    xv  XfjL  w{v%  =   j  In  (%)  wn  (x)  d  x 


2nN 


V,  jU  =  l 


N-l 


2    cmn\  of  /(A). 


2tcN  v=-N+i 
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Since  w(pN)  =  w^p  the  weight  function  wN  (x)  can  be  chosen  so  that  wN  (x)  = 
wN(  —  x).    Note   that    the    periodogram    is    an    even  function.    The  mean 
value  of  the  estimate  is 


But 


Ef%W=  j  EIN(x)wN(x)dx. 


—n 


E IN  (y) 


1  E 

2jiN 

N 

2 

2^  xv  e 
„=i 

1     F 

2nN 

n                   i           iN(u-y) 

—n 

n 

1      f 

sm  -^(y-u) 

z           it„.\j„. 

2nN  J 

•  2  y- 

sin*  -- — 

1   \U  }  U/  vu. 

-  U 

Let  wN(x)>0  and    j  wN(x)dx=l.    The  bias  is  then  given  by 

—  71 

bN(X)  =  Ejl(X)-t(X) 


—n  *> 


n      ■     2^ ', 

1  sin   —  (y  —  x) 


sin 


2  y~x 

2 


[f(y)-fW]dydx. 


Assuming  that 


it  is  seen  that 


f(x)-f(y)  =  0(\x-y\), 


bN(A)-   j  wN  {x)  [f  (x)  -  f  (A)]  d  x 

—  71 

N 


=  2^NiWKix)  J 


sin   -w(y  —  %) 


—  71 


sin 


2  y~x 


[f{y)-f{x)]dydx 


0 


log  W 

>    N    j 


(see  Fejer  [1])  so  that  the  bias 
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bN(X)  =  /  wN(x)[f(x)-f(X)]dx  +  o(^j^-Y  (1) 


Using  the  Schwarz  inequality  we  find  that 


/log  N\ 
bNW  =  0(eNy+0\-f-\,  (2) 


where  e2n=  j  (x—\X\)2  wN(x)  dx 

o 

since  f(X)  =  /(  —  X)  and  wN(x)  =  wN(  —  x).  The  quantity  e%  describes  the  con- 
centration of  wN(x)  about  the  points  X  and  —  X.  eN  might  be  called  the  width 
of  the  spectrograph.  As  we  shall  see  later  the  second  term  of  (1)  is  in  most 
cases  negligible  compared  to  the  first.  For  the  periodogram  the  second  term 
is  the  main  one  as  the  width  sN  is  zero.  However,  in  spite  of  its  very  low  bias, 
the  periodogram  is  not  useful  as  an  estimate  of  the  spectral  density  since  it 
is  not  even  consistent  (see  section  4.6). 

At  best,  (2)  gives  a  crude  estimate  of  the  bias.  In  section  4.6  better  esti- 
mates for  the  bias  of  specific  spectrograph  estimates  will  be  obtained.  The 
relation  (1)  implies  that  fti{X)  will  be  an  asymptotically  unbiased  estimate  of 
f(X)  if  wN(x)  concentrates  all  its  mass  at  the  points  X,  —  X  as  N^oo,  that  is, 
if  for  any  e  >  0 

J       wN(x)  dx^l 

\x±X\<e 

as  N-^oo. 

4.4.  The  Asymptotic  Variance  of  Spectrograph  Estimates 

There  are  two  distinct  situations  that  are  considered  in  this  section.  We 
first  discuss  the  case  of  an  asymptotically  unbiased  estimate  of  the  spectral 
density  where  the  corresponding  weight  function 

wN(x)=-^-IlwipN)eipx 

2  71 

varies  with  N.  Then  the  case  of  a  fixed  weight  function  w(x)  =wN(x)  is 
considered.  The  second  problem  is  of  special  interest  when  one  wishes  to 
estimate  the  spectral  mass  in  a  fixed  interval. 
Let  the  spectrograph  estimate  be 
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71 

C  1       N 

a*=    \  IN(x)wN(x)dx=  - — —    2    w{v%xvxlt 
J  ZnJS  Vift=i 


We  assume  that  wN  (x)  -  wN  (  —  x)  and 

71 

J   \wN  (x)  \2  dx<  oo. 

—  71 

We  shall  also  assume  that 


N 


r-^A -o  (i) 


2  K 


asiV^ob.  The  variance  of  the  estimate,  when  the  observed  process  is  a 
linear  scheme,  is  given  by  (see  4.2) 


JV4rcaZ)a[aJ]  = 

1 


2,  WVX  -  Vy.   WV2  -  fl2    |/v,  ~  Vi   rPl  ~  A<2   "T"    ^V,  -  /'2  ^V,  ~Ml      '  ( ^ ) 

OO 

T  6       ^      "vt  —  V  Q-'Vn  —  V  ^*Mi  —  v  "/J*  —  !>]• 


Let  the  spectral  density  /(A)  be  positive  and  continuous.  One  can  then 
approximate  f(X)  above  and  below  by  a  positive  finite  trigonometric  form 
in  treating  the  first  two  terms  on  the  right  of  (2).  It  is  enough  to  consider 
the  first  term.  Let  vx  —  v2  =  p,  fa—  ju2  =  q  be  fixed.  Then  we  have  to  deal 
with  the  coefficient  of  rprg,  that  is, 

I    N-p,  N—q 

tit         Z,         wv-i*+p-q  wv-ii 
■iV     v,ix  =  \,\ 

+     1    wl%.aw[N>(N-p-r)+      2+  «><«_„  w(,N>  (N  -  q  +  r)\ 

r=l  r=o-p-l  J 


Mpq 


when  say  p>q>0.    Consider  the  difference  between  apq  and 
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A        _       V     ,..(iV)  ...(AT) 

""■  P  a  Zl,      wr  +  p-q  wr     • 

r  =  -oo 


Now 


2   ,^,«(2|l»|  +  2|g|  +  lr|)+    2    ^ 

apQ  —  ^pq\  <\r\<N  -*■» \r  \>  N 


oo  —  oo 

(^)|2  V     L/,^)l2 


r  =  -oo  r  =  -oo 


2  KT  2  K 

=  -00 

by  the  Schwarz  inequality.    But 


■  f 


2  l«Ha( 


r-w 


oo 
r  =  -oo 


as  iV>oo  because  of  (1).    Thus 

|  aPQ  ~  ^Pg  | 

00 

r  =  -oo 


1 


•0 


as  N-^oo.  If  /(A)  is  approximated  by  the  finite  positive  trigonometric 
form  p  (A),  the  first  term 

1       N 
is  asymptotically  the  same  as 

—  71 

On  letting  the  maximal  difference  between  the  two  approximating  trigono- 
metric forms  go  to  zero  we  see  that  the  first  two  terms  on  the  right  of  (2) 
are  asymptotically  the  same  as 

n 

2(2rc)8  J  w%(x)f(x)dx 

—n 

as  N->oo.  For  the  moment  the  interest  is  in  a.%  as  an  asymptotically  un- 
biased estimate  of  the  spectral  density  at  A,  that  is  f{X).  If  xt  is  a  linear 
process  satisfying  conditions  (4.2.6),  it  seems  appropriate  to  choose  a  weight 
function  wN(x)>  0  such  that 
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J  wN(x)  dx=\  (3) 


and  for  any  e  >  0 


wN(x)->0  (4) 

uniformly  for  \x  +  X\>e  as  iV— >oo.  The  estimate  a*  is  then  asymptotically 
unbiased  and  just  as  in  section  4.2  we  can  show  that  the  third  term  on  the  right 
of  (2)  is  negligible  with  respect  to  the  first  two  terms  as  N->oo.  Under  these 
assumptions  we  see  that 


n 


£2[a*v]~^   \w\{x)f{x)dx.  (5) 

—  71 

Note  that  the  quantity  e  does  not  appear  in  this  formula.  //  xt  is  a  normal 
process,  the  third  term  on  the  right  of  (2)  vanishes.  The  weight  function  wN(x) 
is  now  not  assumed  to  be  nonnegative.  Condition  (4)  is  replaced  by 

\wN(x)  |-^0 

uniformly  for  \  x  +  X  \  >  e  as  N-^oo.  If  the  weight  function  wN(x)  satisfies  this 
condition  and  (3)  the  asymptotic  relation  (5)  holds.  Since  wN(x)  concentrates 
all  its  mass  in  the  neighborhood  of  the  points  X,  —  X  and  f  (x)  is  continuous  and 
even,  we  can  rewrite  (5)  as 

D2[*%]~^f(X)    (w2N(x)dx. 

—  71 

At  times  it  may  be  convenient  to  introduce  the  auxiliary  weight  function 

ojn(x)  =2wN(x) 
defined  on  0  <  x  <  n.  Note  that  under  our  assumptions 

J  co at  (x)  dx=l 

0 

and  for  any  e  >  0 


o)N(x)\-+Q 
uniformly  for  \  x  —  X  |  >  e  as  N->oo,  X  >  0.  It  then  follows  that 


n 

D2[x*N]~^f2(X)    ^co2N(x)d 


x. 
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Let  us  now  consider  the  variance  of  the  spectrograph  estimate  a*  when  the 
weight  function  w(x)  =  wN(x)  does  not  depend  on  N.  The  observed  process  xt  is 
a  ssumed  to  be  a  linear  process 

xt~z*  at-v  £v 


with 


E£v  = 

0  ' 

E£  = 

1 

E£i< 

oo 

The  spectral  density  f  (y)  of  the  process  is  assumed  to  be  positive  and  continuous. 
Let  the  weight  function  w(x)  be  bounded,  symmetric  about  zero  and  have  at 
most  a  finite  number  of  discontinuities.  Note  that  the  conditions  on  w  (x)  imply 
that  (1)  is  satisfied.  Just  as  in  the  early  part  of  this  section  one  can  prove 
that  the  first  two  terms  on  the  right  of  (2)  tend  to 

2{2nf  j  w2(x)f(x)dx 

—  71 

as  N^oo.  However  the  third  term  on  the  right  is  no  longer  of  small  order 
compared  to  the  first  two  terms  as  N-^oo  and  we  shall  have  to  consider  it 
in  some  detail.  The  third  term  is 


AT 


oo 


-yr  2-,  ^Vl  —  /il'^Vt  —  /li  2,        ^Vi-V  Mvi-V  Q/H1-V@,/Jz-V 


J>=  —00 
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N 
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2-,  ®a.-v  & 

/ ,    Q"y-v  & 

a=l 

y=l 

(6) 


w  (x)  w  (y)  dx  dy. 


Consider   the    above   integral   with   w(x),  w  (y)  replaced  by  e  inx,  e  %my 
respectively.    Introduce  the  function 

00  oo 

Cn,m{x)=      2,      akak-n£  Z,     alal-m& 

fr=-oo  1=  -oo 

The  function  cn<  m  (x)  is  continuous.    Now 


71 


m 


yXfC  ax  2^     "oi—v"oL—n—v"y—v^"y-m~v 


v=  -00 


so  that 
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e 

N 


2 


N 


2   «a-. 


AOLX 
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AT 


2_,    "'y  —  v  ° 


•ini  n-xmy 


dxdy 


min  (N,N+n)      min  (N,  N+m) 


=  N{2n)'' 


V,  N+n)      min  (N,  N+m~)        -i  /• 

c(l.  1  +  n)  y=  max(l,  1  +  m)^  7T    J 


a=  max(l.  1  +  n)  y=  max 


But  if  n,  m  >  0  expression  (6)  can  be  written 


47re 
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I*    i  (n-m) 


2nN 


N  —  n       .    N  —  m 
sin  — x  sin  — - —  x 


&  Cn.m  \pC) 


dx 


X 


sin 


4:7t2e 


sin 
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N-n 


i  (n-m) 


&  Cn,  m  \<K) 


X 

2  n  —  m 
cos  — - —  x  + 


.     (N  —  n)      .  9n  —  m   ' 

sin — a;  sin  — - — x 


x 


X 


sin 


sin 


dx. 


According  to  the  Riemann-Lebesgue  lemma  the  second  term  tends  to 
zero  as  N-^oo.  The  first  term  tends  to  4jz2ecntTn(0)  =  4tt2  ernrm.  The 
argument  is  quite  analogous  for  other  values  of  n,m.  Now  consider  a  finite 
trigonometric  polynomial 


h(x>  V)=      2      hntme  inx  e  l 


my 


n,  m  =  -p 


replacing    w  (x)  w  (y)  in    (6).    The  limit  of  the  corresponding  expression 
as  N-^oo  is  then 


■*  7Z    €         2^         "n,  m  'n  'm       ^  7Z    6 
n,  m  =  —p 


h(x,  y)f{x)f{y)dxdy. 


Given  any  function  w  (x)  satisfying  the  conditions  specified  above,  two  finite 
trigonometric  polynomials  h±(x,y),  h2(x,y)  can  be  chosen  satisfying  the 
conditions 

h1{x,y)<w(x)w{y)<  h2(x,y) 


and 
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/    j  [h2(x,  y)-h1{x,  y)]dxdy<e 


4.4-4.5 

for  any  e  >  0.  Expression  (6)  lies  between  the  corresponding  expressions  with 
h^x^y),  h2{x,y)  respectively  in  place  of  w(x)w(y).  On  letting  2V->oo,  we 
see  that  the  lim  and  lim  of  expression  (6)  lie  between 


4jt2e  l\hx(x,  y)f(x)j(y)dxdy 


and 


4tt2  e  jjh2{x,  y)f(x)f(y)dxdy. 


Now  on  letting  e-^0  it  is  clear  that  the  limit  of  (6)  as  iV->oo  is 


4  n2  e 


J    w  (x)  f  (x)  dx 


Thus 


n  r     n  -i  2 

lim  ND*[ol*n]  =  4: 7i  j  w2(x)f2(x)dx  +  e     j  w(x)f{x) 


N-+°o 


' — n 


If  the  limiting  variance  of  the  estimate  is  expressed  in  terms  of 

w  (x)  =  2  co  (x),      0<^<7T, 

making  use  of  the  fact  that  w  (x)  and  f  {x)  are  even  functions, 


lim  ND2[<x.*N]  =  e 


n2 


L0 


j  co  {x)  f  (x)  d  x     +2tz  j  co2(x)f2  (x)dx. 


4.5.  Another  Class  of  Estimates 

The  estimates  discussed  in  this  section  are  a  subclass  of  the  family  of 
estimates  of  the  form  (4.1.1)  but  they  are  not  of  the  spectrograph  type. 
However,  they  are  of  considerable  interest  because  they  seem  to  be  well 
suited  for  use  when  building  certain  types  of  analogue  computers  to  estimate 
the  spectral  density  of  a  stationary  process. 

It  is  convenient  to  discuss  these  estimates  in  terms  of  an  observed  continuous 
parameter  stationary  normal  stochastic  process  xt,  —oo<t<oo,  Ext  =  0, 
that  is  continuous  in  the  mean.  The  spectrum  is  assumed  to  be  absolutely  con- 
tinuous. One  has  observed  xt,  0  <  t  <  T,  and  one  wishes  to  estimate  the  spectral 
density  f(X)  of  the  process.  An  estimate  analogous  to  (4.1.1)  would  be 
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T 

a*  =9 — m         xuxvbT(u,  v)dudv 

0 

where  bT(u,  v)  =  bT(v,  u).    Here 

bT(u,v)=  jjeiux-ivydWT(x,y) 

—  00 

where  WT{x,y)  is  of  bounded  variation. 

Let  us  first  discuss  this  subclass  of  estimates  in  a  heuristic  manner.  The 
basic  idea  is  as  follows.  Pass  the  message  xt,  0  <  t  <  T,  through  a  linear 
filter  *J  with  frequency  response  function  (priy)  highly  concentrated  about 
the  point  X  at  which  one  wishes  to  estimate  the  spectral  density.  The  average 
power  of  the  output  is  then  measured.  The  filtered  process 

t 

zt  =  'Jxt  =  j xsgT(t  —  6')  ds,     0  <  t  <  T, 
o 

where  the  real  function  grit)  is  the  transient  response  function  of  'J.  We 
assume  that  g  (t)  =0  for  t  <  0.  The  average  power  of  the  filtered  process  is 

T  T 


-     \zt\2dt  =  -  xuxvgT{t-u)gT(t-v)dudvdt. 


1 

0  0 


T 


Here  bT  (u,  v)  =  2jc  j  gT  (t  —  u)  gT  (t  —  v)  dt. 

o 

A  physical  device  using  such  an  estimate  has  been  suggested  by  Stoner  and 
Bates  [1].  Let  2  be  a  linear  filter  governed  by  the  differential  equation 

A[zt+pzt+tfzt]=xt,     p>0, 

where  zt  is  the  output  of  2.  Here  zt  denotes  the  derivative  of  zt  with  respect 
to  t.  Note  that  here 

A'1 


and  that 


v^-y^ipy  +  f 

Y^sinO/^2-^/2) 


g  (t)  = 


if  Z>0 


A)/±k2-p2 

0  if  Z<0. 
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The  concentration  of  cp  (y)  at  the  points  X,  —  X  is  increased  by  making  /? 
smaller  as  T->oo.  The  power  would  be  a  reasonable  estimate  for  f(X)  = 
/(  —  X),  the  spectral  density  at  X. 

The  class  of  estimates  that  we  shall  discuss  in  some  detail  differs  slightly 
from  the  estimate  discussed  above  in  that  a  shift  of  the  centering  at  the 
point  X  at  which  one  wishes  to  estimate  the  spectral  density  is  accomplished 
by  multiplying  the  message  xt  by  eiiX.  This  is  convenient  if  the  analogue 
computer  is  an  electronic  device  since  it  can  then  be  accomplished  by 
mixing  xt  with  the  wave  elU  from  a  local  oscillator  of  frequency  X. 

The  spectral  density  f(y)  of  the  process  xt  is  assumed  to  be  positive,  con- 
tinuous and  bounded.  We  shall  have  to  make  various  assumptions  about  the 
transient  response  function  gr{t)  and  the  frequency  characteristic  <Pt{%)  = 

oo 

J  e~xtx gT{t)dt.  The  transient  response  function  <7r(0  is  assumed  to  beareal- 


valued  function  that  is  zero  when  t  <  0.  But  then  (px{x)  =  cpri  ~  x).  Let  \  gr{t)  |, 
£7:r(£)|2,  I^HOl4  and  \<Pt{%)\,  \<Pt{%)\2,  \<Pt{x)\4:  be  integrable.  The  function 
(pr{x)  |2  is  assumed  to  behave  more  and  more  like  a  S- function  at  x  =  0  when 

T^oo,  that  is. 


00 


—  00 


/  \cpT(x)\2dx  =  l 

.     ,  (1) 

J     \<Pt  (x)\2 dx-+l 

\x\<e 

for  any  e  >  0  as  T^oo  and 

>r(*)|->0  (2) 


uniformly  when  \x\>  e  as  T-^oo.  Additional  assumptions  will  be  introduced 
in  the  development  when  required. 

The  signal  xt  which  is  mixed  with  the  local  oscillator  wave  elU  becomes 

t 

zt  =  2Mxt  =  Jxtem=  jxseis*gT(t-s)ds, 

o 

0<1<T,  after  filtering.    Here  Mxt  =  xtem.    Our  estimate  of  f  (X)  is  the 
average  power  of  this  output  and  is  given  by 

T  T 


a.%  =  —\\zt\2dt  =  —\   \      xuxv  cos  (u  —  v)  XgT{t  —  u)gT(t  —  v)dudvdt. 
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Thus 


T 


bT  (u,  v)  =  2ji  cos  (u  —  v)  2.1  gT  (t  —  u)  gr  (t~v)  dt, 


0  <u,  v  <  T. 

The  estimate  of  the  bias  that  we  get  is  very  crude  and  is  obtained  by 
making  a  very  restrictive  assumption  about  grit)-  We  assume  that  gr{t)  =  0 
when  t  >  h(T)  >  0  where  h(T)  =  o(T'h).  The  expected  value  of  |  zt  |2  is 
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On  using  the  Schwarz  inequality  we  see  that 
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The  bias 
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Since  \<Pt{%)  |2  behaves  like  a  (5-function  at  x  —  0  as  T->oo,  the  estimate  is 
asymptotically  unbiased.  In  section  4.6  we  shall  consider  in  some  detail  a 
specific  estimate  of  the  type  discussed  here  with  gr(t)  nonzero  for  every 
t>0. 

Now  let  us  consider  the  asymptotic  variance  of  the  estimate.  Note  that 

hT(u,  v) 

T 

=  2jz  cos  (u  —  v)  A  f  gT  (t  —  u)  gT  (t  —  v)  dt 

o 

T  _      oo 

±71  J    J  J 

0      -oo 
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-\-  (pT(x  —  y)(pT(y  —  X)]dxdy  dt 
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,  e~lux+lvy  — [<pT  (x  +  X)  cpT  (y  +  X)  + 

4tt  J  J  .        i{x  —  y) 


■00 


+  99r(#  —  k)q)T{y  —  X)]  dxdy. 
The  weight  function 

1     e^(x-2/)_1 


^r  (#,  2/)  =  t .,    _         [^r  (^  +  A)  99r  (2/  +  A)  +  9?r  (^  -  A)  <pr  (2/  ~  A)] 

re  7z     1  \x     yj 

is  one  of  the  weight  functions  that  correspond  to  the  given  estimate.  The 
function  wT(x,y)  is  the  Fourier  transform  of  bT(u,v),  —  00  <  u,  v<cx), 
rather  than  the  Fourier  transform  of  bT(u,v)  on  the  square  0  <  u,  v  <  T. 
For  this  reason  we  cannot  expect  the  analogue  of  (4.1.4)  to  hold  for  wT{x,y). 
However,  it  does  hold  for 

T 

&t(x,  y)--L.  J  J  6r(«,  v)e-""+""'dxdy, 

0 

that  is, 


00 


£2  [a?]  -  ~  j  j  I QT  (x,  y)  |2  /  (x)  /  (y)  dx  dy. 


—  00 
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One  would  like  to  find  a  condition  that  would  allow  us  to  assume  that 
the  analogue  of  (4.1.6)  holds,  that  is, 


D'W\ 
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oo 
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wT(x,  y)\*  f  (x) I  (y)  dx dy. 
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-oo 


The  inequality 
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<  (jj\DT(x,  y)\2f(x)f(y)dxdy) 


\l/2 


■  —  oo 


follows    from    the    Minkowski    inequality.     Here    DT(x,  y)  =  wT  (x,  y) 
QT  (x,  y).    But 

oo 

H\DT(x,y)\*f(x)f(y)dxdy 

—  oo 
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<CJf\DT(x,  y)\2dxdy 
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by  the  Parseval  relation,  where  C  is  a  constant.     We  assume  that 

TOT  oo 


dudv  =  ol  \wT  (x,  y)\2  f  (x)  f  (y)dxdy 


I      I      |   gT(t-u)gT(t-v)dt 

-oo    -  oo     0 


since  then  relation  (3)  holds.     We  also  assume  that 

oo 

J    \<Pt  (%)  \2\<Pt  (x  +  u)  \2dx    ■ 


max 


■oo 
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j   \(pT  (x)  \*dx 
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as  T-+oo  for  any  A>0. 
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Now  the  variance 
#2  [a?] 
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wT{x,  y)\2f{x)f(y)dxdy 
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*  r  sin2  - 


2f 
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+  <pT  {x-  X)  yT  {x-  X)\2  f  (x)  f(y)dxdy 
{\<pT(x  +  X)(pT{y-X)\2  + 


-00 


+  2  R[(pT(x  +  A)  <pT  (y  +  A)  9?r  (sc  — A)  9?r  (2/  —  A)]  + 
+  \q>T  (x- A)  <pT  (y~  A)  |2}  /  (x)  f  (y)  dxdy 
=  H1-\-  H2  +  Hz. 

We  shall  show  the  second  term  H2  to  be  of  smaller  order  than  H1  and  H3 
as  7->oo.  The  spectral  density  /(A)  is  bounded  by  some  finite  constant  K. 
Thus 

00 

j  Re[(pT(y  +  u  +  X)  yT  {y  +  X)  (fT{y  +  u-A)(pT(y-A.)]f{y  +  u)f(y)dy 

—  00 

00 
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—  00 

oo 
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by  the  Schwarz  inequality.    Let  us  assume  A  =*=  0.    Then  given  any  e  >  0, 
condition  (2)  implies  that 

00 
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for  T  sufficiently  large.    But  then 
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Therefore 


*.-(£ 


as  T— >co.    Since  expressions  H1  and  H3  can  be  treated  in  exactly  the 
same  manner,  we  need  only  consider  Hv    Now 


Hx  = 
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C  sin2  — 


2T 
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<pT  (y  +  u  +  A)  9?r  (2/  +  A)  |2  /  (y  +  u)  f  (y)  dy  du. 


We   divide   H1  into  two  parts  and  estimate  each  part  separately.    The 
first  part 
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since  ^1  can  be  chosen  arbitrarily  large.    On  the  other  hand 
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because    of    conditions    (1),    (2)    and    (4)    imposed   on  (pT{y)-    Note  that 


oo 


J    \<Pt  iy)^  dy-> oo  as  T^-oo.    Thus 


—  00 


ffi-^f  kr(y)|«rfy 


—  oo 


as  T-+oo  so  that 


oo 

£2[a?]~^|^  Jkr(2/)|4d^  (5) 


as  T-^oo,  if  A*0.    It  is  easily  seen  that 


tf^-^pV-jltpriy^dy 


if  A  =  0. 


4.6.  Special  Estimates  of  the  Spectral  Density 

We  discuss  some  specific  estimates  that  have  been  proposed  for  the  esti- 
mation of  the  spectral  density  in  this  section.  Let  us  first  consider  the  periodo- 
gram.  This  estimate  was  introduced  in  section  3.1.  It  was  originally  designed 
to  isolate  "periods"  of  a  time  series  as  described  in  Chapter  3,  and  not  for 
the  estimation  of  the  spectral  density.  In  section  4.3  its  bias  as  an  estimate 
of  the  spectral  density  /(A)  was  seen  to  be  very  small,  at  most  of  the  order 
(log N)/N  as  N->oo.  If  the  observed  process  is  normal,  the  variance  of  the 
periodogram 

f/2(A)        if  A^O 

(2/2(0)     if  A  =  0 


D2  Us  W]      , 


as  N->oo.  The  estimate  does  not  converge  in  the  mean  square  to  /(A). 
One  can  even  show  that  it  does  not  converge  to  any  stochastic  variable  (see 
Grenander  [2]).  Thus  IN(X)  is  not  a  consistent  estimate  of  /(A)  and  for 
this  and  other  reasons  it  is  useless  as  an  estimate  of  /(A). 

The  other  estimates  discussed  in  this  section  are  all  consistent  estimates 
of  /(A).  All,  except  for  the  last,  are  spectrograph  estimates.  The  last  one  is 
a  specific  example  out  of  the  class  of  estimates  discussed  in  section  4.5. 
In  the  following  discussion  /(A)  is  assumed  to  be  sufficiently  regular,  that  is, 
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it  has  a  sufficient  number  of  derivatives  for  the  arguments  to  be  carried  out. 
The  spectral  density  f(X)  should  be  thought  of  as  a  function  defined  and 
regular  on  a  circle,  that  is,  the  point  X  =  —  n  is  identified  with  the  point 
X  =  n.  Formulas  (4.4.5)  and  (4.5.4)  will  be  used  in  the  derivation  of  the 
asymptotic  variance  of  the  estimates. 
Bartlett  [2]  has  introduced  an  estimate 


27Tv=-m+i\        m/N-\v\ 

of  /  (X)  where  m  =  o  (N)  but  tends  to  infinity  with  N.    For  convenience 
consider  a  slightly  modified  estimate 

A(A)=~     I*   (l-^)§oo.»A.  (1) 

In  v=-m+i  \        m  J  JS 

The  weights  are 


/ «*     Ill  —  - — I  cos  v  X     I  v  I  <  m 

{  0  I v\  >m 


and  the  corresponding  weight  function  is 


sin   —  (X  —  x)     sin   —  (X  +  x) 


wN  (x)  =  - { h 


4jzm  1     .  „  1  .  «,  1 

sin'  -  (X  —  x)     sin'  -  (X  +  x) 


The  asymptotic  variance 


2/2(A) 


2  U-W^,x 


iv    . 


jff(O)    ifA  =  0. 
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hNM  =  Bfi(X)-f(X) 


1       V     /        |v|\  /,      \v\\  ,        1       ? 

—      2        1-       I    1_  V    r'  cos  v^~^~     2    r,  cosrA 


2,      J —  rv  cos  v  A  =  —    2,    — rv  cos  r  A 


2n  v=-m+i  vn>  7t  „=o    w* 
y    cos  v/    I  sin  v a?/'  (#)  d a; 

1     V    ?smv(x-X)+amv{x  +  X)., 


nm  v=0 


^         /•    Sill    V     X  —  A )  T    Sill   V  IJC  "TAJ...     ,     7 

2    J        « /  (*)w 


—  7T 


1       ?  |sin  (x  —  A)  —  sin  ra  (#  —  A)  +  sin  (ra  —  1)  (a;  —  A) 
jrm  j^  |  4  (1  —  cos  (x  —  A)) 

sin  (x  +  A)  —  sin  m  (x  +  A)  +  sin  (m  —  1)  (a:  +  A))  ,.  .  .  _ 
4(1-  cos  (a;  +  A))  I 

1  ?      sin  (a:  -  a) 

~o P  J   1 1 V\f  (x)dx> 

ZTim       J   1-  —  cos  (x  —  A) 

where  the  principal  value  of  the  last  integral  is  to  be  taken.  Note  that  the 
weight  function  of  this  estimate  is  positive.  This  implies  that  the  estimate 
is  positive  with  probability  one  and  this  is  certainly  a  desirable  property 
in  many  contexts. 

Another  estimate  of  the  spectral  density  that  might  be  called  the  rec- 
tangular estimate  is  given  by 

Ma)  =  —    2   Tt r- oosv  A  (2) 

where  h  =  ji/m  and  m  =  o  (N)  tends  to  infinity  with  N.  The  weight 
function  of  this  spectrograph  estimate  is 

.  —     if  I  x  +  A  I  <  h 

wN(x)  =  {±h 

0       otherwise 

and  this  is  what  suggests  the  name  rectangular  estimate.    The  variance 
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4:71 


D2[tfr(X)]~^-  /  wl(x)f(X)dx 


^f2W      if  A* 6 


fm/2(0)     if  A  =  0. 


N 


The  bias 


A  —  A  A  —  h 

=  f"(X)h2/6  =  f"(X)ji2/6m2. 

This  estimate  also  has  a  positive  weight  function. 

A  third  estimate  is  easily  obtained  by  truncating  the  expansion  of  the 
periodogram  in  terms  of  the  product  lag  moments  Cv.  This  estimate  is 

1      m    r 
/iW-r    2    -^oosvA  (3) 

Z7T  r  =  _m  iV 

where  m  =  o  (N)  but  tends  to  infinity  with  JV.  The  weight  function 
corresponding  to  the  estimate  is 

1   |sin  (m+  \)  (x  —  X)     sin  (ra  +  J)  (#  +  A)l 
4  tz;  \      sin  J  (x  —  X)  sin  J  (#  +  X)      J 

This  estimate  has  rather  small  bias  and  variance.  However,  this  estimate 
may  be  negative  for  some  values  of  X  since  the  weight  function  wN  (x)  is  not 
positive  for  all  values  of  x.  Under  some  circumstances  it  may  be  rather 
unpleasant  to  have  a  negative  estimate  of  a  positive  quantity,  e.g.  when 
one  is  interested  in  log  f(X).  The  variance  of  the  estimate 


v=-m 


f2(X)     if  X^O 
/2(0)     if  X  =  0. 
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1  '  /  \v  \\  1 

&at(A)  =  ^—     2    (1_y  )rv  cos*>A-  —     2    rv  cos^A 


1  ?     sin(*-A) 

ttN     _J   1-  cos  {x-X)'       ' 


Tukey  [1]  has  used  the  following  expression 


2  m 


m-\ 


1  +  22   ^  cos*  A  (.46  cos  — +  .54)  + 

N        vfi  N-v  \  m  J 


+ 


a 


N  —  m 


cos  m A  (.46  cos  jr  +  .54) 


71  V 

X  = —  ,  p=l.  2,  ...  ,  m—  1 


as  an  estimate  of  the  spectral  mass  in  the  interval  (A  —  \(n/m),  A  +  \{n/m)). 
It  is  again  understood  that  m  =  o  (N)  and  that  m  tends  to  infinity  with  N. 
On  dividing  by  n/m  we  get  an  estimate  of  the  spectral  mass  at  X.  For  con- 
venience we  modify  this  estimate  of  the  spectral  density  slightly  and  con- 
sider 


f»  W = 5T-  %  +  -  2  %  cos  » *  f  -46  cos  -  +  -54 


(4) 


This    estimate    is    a    linear    combination   of    estimates    of  the  form  (3), 
The  corresponding  weight  function  is 


1        e    sin  (m  +  A)  (a  —  X)        „ ,  sin  (m  +  A)  (x-\-  X) 

wN(x)  =  —-\  .54 \ — -  ,     \ -  +  .54 -, — -   2/  v -  + 

4  n  1  sin  J  (z  —  A)  sm  f  (a;  +  X) 


sin  (m+  J)  [x  —  X 


+  .23 


7T  ' 


sin  (ra  +  J)  (#  +  X 


sin  A  I  ic  —  X 


+  .23 


71 


m, 


sin  \  \x  +  X 


+ 


sin  (ra  +  |)  (a:  —  A  + 


+  .23 


7T 


m, 


sin  (ra+  J)  (#  + A  + 


+  .23 


71 


m. 


sin 


^  — Ah 


sin  J  I  x  +  A  + 
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The  variance  of  the  estimate 


2  m       /  7TV        •  \2 

&[fNW\~-f{X)    2       .46  cos— +.54      cos2*  A 


v=-m 


/2(A)((  .46)2  +  2(  .54)2)      X*0 
/2(0)((  .46)2  +  2(  .54)2     A  =  0. 


The  bias 


1>nW  =  K-    I    rJl-W)(.46cos— +  .54)cosvA-/(A) 
2rc„=_m     \        Jv/  \  m  / 


1        ^     "        /       ^     ,\ 

—  (.46)    >    r„    cos 1     cos  va 

In  v=-m     \        m        J 

\       mj 


*/*+£+*/ 


.23/"(A)(— |      if     m2  =  o(2V). 


This  estimate  also  may  be  negative  for  some  values  of  a  since  the  correspond- 
ing weight  function  is  not  positive  for  all  values  of  x. 

Still  another  spectrograph  estimate  of  interest  is  the  one  whose  weights 
are  the  products  of  the  weights  of  Bartlett's  estimate  and  the  rectangular 
estimate  respectively.  This  estimate  is 


fNW  =  7>-   Z    ~w    1_         — —  cosrA 


(5) 


where  h  =  7t/m.    The  weight  function  of  the  estimate  is 


wN(x)  = 


—  -  f 

Tim   2  h    J 

7,-h 


m 


m 


"/1   h  sin2  y  (x-jLi)  x+h  sin2  ^  (x  +  p) 


siir  \  (x  —  ju) 


sin*  £  (^  +  ^)        J 


The  estimate  is  positive  with  probability  one  since  the  weight  function  is 
positive.  The  asymptotic  variance 
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»«™rJk*i.(.-S)W-''1 

0 


The  asymptotic  bias  is  the  same  as  in  the  case  of  Bartlett's  estimate. 

It  is  worthwhile  contrasting  the  inconsistent  periodogram  with  the 
spectrograph  estimates  described  here  that  are  consistent  estimates  of  the 
spectral  density.  Anyone  looking  at  the  problem  of  estimating  the  spectral 
density  might  have  thought  the  periodogram  a  good  estimate.  The  statistics 
Cv/N  are  good  estimates  of  the  co variances  rv,  v  =  0,  1,  ... ,  N,  even  though 
they  are  biased.  Since  the  spectral  density  is  given  by 

1       °° 
/U)=—     2    rvcosvk 

it  might  seem  reasonable  to  replace  rv  by  its  estimate  Cv/N  when  |  v  \  <  N 
and  by  zero  when  |  y\  >  N.  But  we  then  get 


\       N    C  1 

!nW  =  ^~    2    t^  cos  vX  =  —— - 


N 
v  =  l 


ivX 


i.e.,  the  periodogram.  The  inconsistency  of  IN{X)  arises  from  the  fact  that 

for   large  \v\,  the  statistics  Cv/N  are  relatively  unstable  estimates  of  rv. 

In  order  to  obtain  consistency,  one  should  delete  the  contribution  from  them 

as  was  done  in  the  case  of  the  truncated  estimate  (3),  or  give  them  a  very 

small  weight  as  compared  with  the  contribution  from  the  Cv/N  with  small 

1 1>|  as  was  done  in  the  case  of  the  rectangular  estimate  (2). 

Let  us  now  consider  a  special  estimate  of  the  type  considered  in  section 

4.1.  Note  that  all  those  estimates  are  positive  with  probability  one.  Let 

-e-/"     when  Z>0 
0         when  £<0 


so  that 


00 


0 


-ixt     -Rt 


e-'"e-i"dt 


|/i_L 


fi  +  ix 
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Here    fi  =  l3(T)    is    a    function    of    T   which  approaches  zero  as  T->oo 
Clearly 


oo 


jb'lx)?ix-«j\pTuFd 


X=  1. 


Moreover 


(pp{x)\*dx 


*^-F 


dx 


—  OO  —  00 

Let  us  now  consider  the  asymptotic  bias  of  the  estimate 


where 

Now 
so  that 

T         t 


T 

f*T(X)  =  ~  j  \zt\*dt, 


2(=  j  xaeis*gf,(t-s)ds. 


e-""gi)(u)du^  l/£  [1 -«'»+/»«]      *_ 

r    71  p  +  lX 


-iu(x-A) 


gp{u)du 


o      o 


dt 


7i\P  +  i(x-l.)\2T  J   |X     6  I  dt 

0 
T 

^  i   f  [l-2e"^  cos(z-;i)Z  +  <r2^]^ 


1 


2/? 


1 


-&T\ 


n\P  +  i(x-X)^\       T\p  +  i{x-)i)\2     2fiT 
The  bias 

hTW  =  EfT(X)-i(X) 


+  — - -  +  O(e~0J) 


oo 

=£  f  i 

w  J  F 


(*)-/(*) 

+  {x-X)' 


dx  — 


oo 

wT  J    (,S2  +  (x-A)2) 


2\2 


6ZiC 


+ 


oo 

JL  [__1 

nT  J    /?2  +  < 


(g) 

(a; -A)1 


^^  +  0(e^r). 
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Note  that 

tot  2.    .      i      /r 


/     /  \j  g?(t-u)g?(t-v)dt 


—  oo     —oo     0 


dudv<—^  =o  I  — 


if  1/T  =  0(^).    Clearly  condition  (4.5.4)  is  satisfied  if  l/T  =  o(P). 

On  using  (4.5.5)  we  see  that  the  asymptotic  variance  of  the  estimate 

00 

n2r/*nvi     2;r/  (A)    j"  .  ,4  ,       /  (A) 

#  [/r(A)]~ ^ l!    \<pp(x)\*dx  = 


T        J    Irpv "  0T 


-00 


if  A  =+=  0  and  that 


Wr<0)]~^ 


Before  closing  this  section  it  is  worthwhile  looking  at  the  consistency  of 
the  spectrograph  estimates  that  we  have  considered.  As  we  mentioned, 
this  was  due  to  the  attenuation  of  the  influence  of  the  product  lag  moments 
Cv/N  with  |i>|  large.  However,  this  can  be  looked  at  from  a  different  point 
of  view.  All  these  consistent  spectrograph  estimates  were  smoothings  of 
the  periodogram.  The  weight  function  approached  a  d-i unction  in  behavior 
as  N-^oo  at  A  and  —A  but  slowly  enough  so  that  the  estimate  was  consistent. 

4.7.  The  Mean  Square  Error  of  Estimates 

It  is  clear  from  the  general  discussion  of  estimates  of  the  spectral  density, 
as  well  as  from  the  discussion  of  special  estimates,  that  one  cannot  improve 
the  asymptotic  behavior  of  the  bias  and  variance  of  the  estimate  simultaneously. 
If  the  estimate  is  modified  so  as  to  improve  the  asymptotic  behavior  of  the 
bias,  the  variance  will  behave  worse  and  vice  versa.  One  is  therefore  led  to 
consider  some  function  of  both  the  bias  and  the  variance  of  the  estimate.  A 
reasonable  choice  of  such  a  function  is  the  mean  square  error  of  the  estimate, 
that  is 

*l/iS(A)-/(A)l,-EMA)P  +  -D,[tf(A)]. 


Of  course,  a  small  mean  square  error  may  not  be  the  only  property  required 
of  an  estimate.  If  one  were  to  make  use  of  an  estimate  of  the  spectral  density 
in  a  prediction  problem,  it  would  seem  reasonable  to  require  also  that  the 
estimate  be  positive  with  probability  one  as  we  then  have  to  deal  with 
the  logarithm  of  the  spectral  density. 

Consider  the  mean  square  error  of  the  estimates  discussed  in  section  4.6. 
It  should  be  noted  that  the  discussion  concerns  itself  with  the  asymptotic 
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behavior  of  the  mean  square  error.  Assuming  that  m  is  of  form  m  =  KN* 
where  K  is  a  constant,  we  ask  for  the  a  which  ensures  that  the  mean  square 
error  is  of  smallest  order  as  N-^oo.  Only  in  a  few  cases  will  the  optimal 
choice  of  K  be  discussed. 

In  the  case  of  Bartlett's  estimate 


E\MX)-f(X)\~\^f(X)  + 


sin(x~X)  nW 


2  7i  m      J    1  —  cos  (x  —  X) 


A  =f=  0,    as  m,  N-^oo,  m  =  o(N).    The  optimal  choice  of  a  if  m^KN*  is 
a  =  J  for  then 

E\f*N(?L)-fa)\2  =  0(N-$). 
The  rectangular  estimate  (4.6.2)  has  mean  square  error 

-fa)*212 


^l/*a)-/a)i2~^/2u)  + 


6  m' 


A=*=0, 


as  m,  N-+oo,  m  =  o(N).    If  m  =  KNa,  the  optimal  choice  of  a  is  a  =  J 
and  then 

E  |  ft  (A)  -  /  (A)  |2  ~  (y  K  +  <5/X4)  N-\ 

where     y  =  f{X)     and    (3  =  [/"  (A)  tt2/6]2.     The    coefficient    yk  +  d/K*    is 
minimized  by  taking 


K 


f±d\<>  (\r w\\iu fay* 


Then 


\y!      \  fW  J 

E\f%a)-fW\2^5^-^y^d^ 


n 


4\V» 


=  5-4-,''(/(A))'''(|/"a)|)''-^j    zr\ 
The  truncated  estimate  (4.6.3)  has  a  mean  square  error 


E\rN(X)-f(X)\*~~f(X)  + 


n 

IjriV       J    1 


sin<*-*>     />)<** 


—  cos  (x  —  A) 


Here    there   is    no  optimal  choice  of  a.    Clearly  the  mean  square  error 
is  asymptoticalty  the  same  as 


2KN«-1f2(A) 
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if  m  =  KN<x.  However,  by  choosing  a  smaller  we  implicitly  assume /(A) 
more  regular. 

The  mean  square  error  of  the  estimate  suggested  by  Tukey  (4.6.4)  is 
given  by 

E  |  f%  (A)  -  /  (X)  |2  ~  |  f"  (A)  (  .7748)  +  (  .23  /"  (A)  ^  • 

If  m  =  KNa,  the  optimal  choice  of  a  is  1/5.  A  discussion  analogous  to  that 
on  the  rectangular  estimate  indicates  that  the  best  choice  of  K  is 


K  = 

and  then 


rw 


''•/2116 


|    f(X)    J     \7748 


71 


4  \ 


E\rNW-fw\2~5-±-tuuw)°u\f"w\2U 

(  .7748)4/5(  .23ti2)7UN-4u. 

Note  that  the  asymptotic  mean  square  error  is  of  the  same  order  as  that  of 
the  rectangular  estimate  but  with  a  coefficient  smaller  by  about  a  fifth. 

The  mean  square  errors  of  the  last  two  estimates  considered  in  section 
4.6  are  0(N~'3)  and  0(T~U)  respectively  if  one  chooses  m  =  KN*1'  and 
ft  =  KT~l\  Note  that  all  of  the  spectrograph  estimates  except  for  the  rec- 
tangular estimate  are  well  suited  for  computation  since  one  only  needs 
to  compute  m  of  the  product  lag  moments  Cv/N. 

4.8.  An  Example  from  Statistical  Optics 

Consider  the  following  model  of  light  that  has  been  suggested  by  Parke 
[1].  At  a  given  point  (x,  y,  z)  at  time  t  the  electric  field  vector  et  is  given  by 


71 

\ 


et  =        exp 


"-9» 


dz{X), 


where  z(X)  is  an  orthogonal  stochastic  process  taking  as  values  vectors  in 
the  (2/,z)-plane.  The  components  of  et  =  (ej1},  ej2))  in  the  y  and  z  directions 
form  the  Maxwell  vector.  It  is  assumed  to  be  stationary  with  spectrum 
determined  by  the  matrix  of  the  spectral  distribution  functions 

y„(A),M  =  l,2. 

This  might  be  called  the  coherency  matrix  and  it  describes  the  polarization 
of  the  light.  If  FU(X)  =  Ftt(X),  Fl%(k)  =  F21{X)=0  the  light  is  circularly 
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polarized.   If   F22  =  F12  =  F21  =  0  the  light  is   linearly  polarized  and   we 
shall  deal  with  this  case. 

The  two  following  examples  are  highly  idealized  versions  of  methods  used 
in  practice  to  estimate  the  spectrum.  Consider  a  grating  spectroscope  with 
a  grating  width  A  and  a  large  number  of  lines  n.  Huyghen's  principle  tells 
us  that  at  the  point  {x0,d)  the  electric  field  is  the  sum  of  the  contributions 
from  the  various  grating  intervals 


2  x(tv)- 

v  =  l 


Here  we  have  set 


tv  =  t  +  -)/xl  +  (d  +  vA)2 


where  c  stands  for  the  velocity  of  light.    Using  the  approximation 

xj       (d  +  vA)*\ 

1/         d2\     dvA 

=  t  +  -  I x0  +  - —  I  H =  a  +  bdv 

c  y         LijCftj      c  Xn 

the  intensity  at  the  point  of  observation  is  seen  to  be  proportional  to 


1 

n 


2  x(a  +  bdv) 

v  =  l 


The  mean  value  of  the  intensity  at  d  is  then  proportional  to 


2nn 


sin2  (n-\-  \)bdX 
sin2  \bdX 


f{X)dk 


which  for  large  values  of  n  tends  to 


/(0)  +  2  2  f{2jtN/bd). 

N=l 

If  the  spectrum  is  essentially  confined  to  an  interval  (a,  fl), 

0<a</?<  oo, 

expression  (1)  is  given  by 

'2  n  c  x0\ 


(1) 


2/ 


dA 
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if  d  <  4ticx0/A^,  so  that  we  estimate  the  spectral  density  at  the  frequency 
2,7icx0/dA.  By  varying  d  we  can  estimate  the  spectral  density  in  a  certain 
range.  The  limiting  variance  of  this  estimate  is  positive.  However,  this 
should  not  be  taken  too  seriously  as  what  one  actually  observes  in  practice 
is  not  the  intensity  at  an  isolated  point  but  rather  the  average  in  a  small 
but  finite  interval  d1  <d  <d2.  This  "smoothing"  is  carried  out  when  the 
light  is  received  on  a  photographic  plate.  In  analogy  to  what  has  been 
said  in  section  4.6  about  the  periodogram  one  might  expect  that  in  this  case 
too,  the  smoothing  produces  a  consistent  estimate.  It  would  be  interesting 
to  find  whether  this  can  be  shown  rigorously. 

Let  us  now  consider  another  method  using  a  prism.  The  prism  separates 
the  harmonic  components  of  the  incident  ray  of  light,  and  the  resolved  light 
is  projected  on  a  plate.  Smoothing  the  intensity  over  a  small  range  of  fre- 
quencies (Al5A2)  another  estimate  of  the  spectral  energy  in  (Al5A2)  is  obtained. 
Note  that  both  estimates  of  this  section  are  of  the  general  form  introduced  in 
4.1. 
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CHAPTER  5 


APPLICATIONS 


5.0.  Introduction 

In  this  chapter  three  fields  in  which  stationary  models  have  proved  to 
be  useful  are  considered.  We  look  at  some  problems  that  arise  in  the  study 
of  random  noise,  turbulence  and  storm -generated  ocean  waves.  The  rational 
behind  the  choice  of  a  statistical  model  that  is  stationary  is  discussed. 

Knowledge  of  the  spectrum  of  the  process  gives  one  information  about 
the  structure  of  the  phenomenon  of  interest.  In  some  cases  one  can  obtain 
a  partial  determination  of  the  spectrum  of  the  process  by  deductive  methods, 
e.g.  from  the  equations  of  motion  in  turbulence.  Such  a  procedure  has  to 
be  supplemented  by  an  empirical  approach.  Some  of  the  methods  of  estimat- 
ing the  spectrum  from  data  in  these  different  contexts  are  discussed. 

The  reader  should  consult  Goldman  [1],  Rice  [1],  James-Nichols-Phillips 
[1],  Lawson-Uhlenbeck  [1]  and  Woodward  [1]  for  an  extended  study  of 
noise  and  related  topics. 

5.1.  Derivation  of  Spectra  of  Random  Noise 

There  are  several  types  of  electronic  random  noise.  We  are  going  to  discuss 
two  types  only  in  this  section.  Shot  noise  was  mentioned  already  in  section 
1.6  where  its  spectrum  was  derived.  In  the  case  of  shot  noise,  the  observed 
current  is  due  to  the  arrival  of  electrons  at  the  anode  of  an  electron  tube  and 
as  this  happens  in  a  random  manner  the  current  will  fluctuate  around  a 
mean  value,  the  direct  current  or  d.c.  component.  The  model  used  in  sec- 
tion 1.6  is  based  on  the  assumption  that  the  time  points  for  these  arrivals  are 
governed  by  a  Poisson  process  with  parameter  /?.  As  /?  tends  to  infinity  it  is 
easy  to  prove  that  the  normed  process  tends  to  a  normal  process  distribution 
wise. 

Another  very  important  type  of  random  noise  is  commonly  referred  to  as 
thermal  noise.  Between  the  endpoints  of  any  conductor  there  will  be  a  fluctuat- 
ing voltage  due  to  the  random  motion  of  the  electrons  in  the  conductor. 
In  order  to  derive  the  spectrum  of  this  voltage  we  shall  carry  out  a  thought 
experiment  following  C.  W.  McCombie  [1].  Consider  a  resistance  R  and 
the  random  voltage  xt  due  to  thermal  noise.  We  connect  the  resistance  in 
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series  with  an  inductance  L  and  a  capacitance  C.  The  resonant  frequency  of 
this  circuit  is  A0  =  1/2 jr  VLC.  Let  the  random  current  produced  by  xt  be 
yt.  We  know  that  yt  is  the  result  of  a  filter  operating  on  xt  with  frequency 
response  function 
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Introduce  the  spectral  densities  fx  {X)  and  fy  (X)  corresponding  to  the  processes 
xt  and  yt  respectively.  The  spectral  density  (see  section  1.9) 

If  E  is  very  small  compared  to  L/C,  the  filter  will  allow  only  frequencies 
in  the  neighborhood  of  X0  to  pass  through  it.  But  then  fx  {X)  can  be  regarded 
as  a  constant  /X(A0)  for  all  these  frequencies,  and  hence 

oo  oo 

Eyl=  j  i»(>.)dX~IAh)  J  WW\*cU.  =  f^  (1) 

—  00  —  00 

Now  we  consider  -  this  situation  from  the  point  of  view  of  statistical 
mechanics.  The  total  energy  is  the  sum  of  the  electric  energy  and  the 
magnetic  energy.  The  latter  is  \Ly\ .  According  to  the  equipartition  theorem 
(see  Tolman  [1],  p.  93) 

E\Ly\  =  \kT,  (2) 

where  T  is  the  absolute  temperature  and  k  is  Boltzmann's  constant.  On  using 
(1)  and  (2)  it  is  seen  that 

fx(?L0)~±BkT. 

This  is  called  Nyquist's  law.  The  spectral  density  /X(A0)  is  independent  of 
Xq  so  that  we  have  white  noise.  It  is  clear  that  the  spectral  density  cannot 
be  constant  on  the  whole  real  axis  as  we  would  then  have  infinite  power. 
However  the  result  holds  up  to  very  high  frequencies.  A  detailed  discussion 
of  the  upper  limit  of  the  thermal  noise  spectrum  can  be  found  in  Lawson 
and  Uhlenbeck  [1],  p.  77. 

One  can  also  prove  that  the  thermal  noise  voltage  is  a  normal  process. 
We  shall  not  carry  out  the  proof  but  refer  to  Lawson  and  Uhlenbeck  [1], 
p.  67. 
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5.2.  Measuring  Noise  Spectra 


In  the  last  section  we  saw  that  sometimes  the  spectrum  of  the  noise  can 
be  computed  if  we  have  sufficient  knowledge  of  the  mechanism  that  gener- 
ates the  noise.  When  studying  the  noise  output  of  an  electronic  device 
that  is  not  too  complicated,  one  usually  considers  the  output  as  generated 
by  the  various  noise  sources  in  the  device  and  external  noise  fed  into  the 
device.  If  the  statistical  structure  of  these  noise  components  is  known, 
we  can  at  least  theoretically  compute  the  spectrum  of  the  output  noise. 
This  procedure  will  break  down,  either  if  the  device  is  too  complex  or  if 
some  of  the  sources  produce  noise  of  unknown  spectra.  If  the  device  has  some 
nonlinear  elements  it  will  not  be  sufficient  in  general  to  know  the  spectra 
of  the  noise  sources  and  the  external  noise.  The  whole  probability  structure 
of  the  noise  sources  then  has  to  be  specified. 

In  such  a  case  it  will  be  necessary  to  actually  measure  the  spectrum  of 
the  output  noise.  It  may  also  be  of  interest  to  study  the  probability  distri- 
bution of  the  output  noise,  e.g.,  to  find  out  whether  the  output  noise  is  nor- 
mal or  not.  This  last  problem  is  briefly  discussed  in  Chapter  8.  The  first 
problem  can  be  treated  in  various  ways. 

One  very  natural  procedure  is  to  feed  the  noise  into  a  linear  device  with 
the  frequency  response  function  9?a(A),  where  a  is  a  parameter  that  can  be 
varied.  Measuring  the  average  power  of  the  output  we  have 


oo 
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Now  if  the  device  acts  as  a  selective  circuit,  allowing  only  frequencies  very 
close  to  a  to  pass  through,  it  is  clear  that  we  can  use  Pa  to  estimate  /(a), 
and  as  a  is  varied  we  will  estimate  the  spectral  density  over  the  range  of 
variation.  Note  that  this  estimate  is  a  generalization  of  the  one  dealt  with 
in  section  4.5  and  is  of  the  general  form  introduced  in  Section  4.1.  It  has  been 
suggested  (see  Davenport,  Johnson,  Middleton  [1])  that  one  could  choose 
the  device  as  a  tuned  circuit  (we  would  then  estimate  the  spectral  density  at 
the  resonant  frequency,  see  section  4.5)  or  use  a  feedback  amplifier  in  which 
the  feedback  loop  has  a  variable  characteristic  attenuating  frequencies 
which  are  not  close  to  a. 

A  disadvantage  of  these  methods  is  that  as  the  elements  in  the  tuned  cir- 
cuit vary  the  function  \qpaW  |2  varies  not  only  with  respect  to  location  on 
the  frequency  axis  but  also  in  shape.  A  method  that  avoids  this  has  been 
suggested  by  J.  S.  Cobine  and  J.  R.  Curry  [1]. 
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Mix  xt  with  the  wave  A  cos(£A  +99)  from  a  local  oscillator  of  frequency 
X  with  amplitude  A  and  a  random  phase  —n<<p<n.  Then  the  modulated 
wave 

2/*  =  7Hxt  =A  cos  ($A  +  (p)xt 

has  the  covariances 

Qs-t  =  Eysyt  =  —  cos  (s-  ^2.2/^^  =  —  cos  (s-£)Ars_*. 

Assuming  the  spectral  density  of  the  original  process  to  be  /  (ju)  the  spectral 
density  of  yt  is 

,   ,    ,     A*(f(y  +  X)±f(y-X)) 

/!/  W  =  j ' 

as  is  easily  verified.  Let  yt  pass  through  a  filter  2  with  a  frequency  response 
function  cp{X).  The  average  power  of  the  output  gives  us  an  estimate  of 

-  00  —00 

If  1 9^  (/^)  |2  is  concentrated  around  ju  =0  this  is  close  to  f{X)/2.  Now  the  filter 
is  kept  fixed  but  the  frequency  X  of  the  local  oscillator  is  varied.  We  then 
see  that  varying  X  means  translating  the  weight  functions  \(p(x  +A)|2  and 
1 99  (x  —  X)  |2  that  belong  to  the  two  sidebands,  and  in  this  way  the  difficulty 
of  change  in  shape  of  weight  functions  mentioned  above  is  avoided. 

It  might  be  of  interest  to  consider  the  estimates  obtained  by  applying  the 
heterodyne  principle  (i.e.,  the  procedure  described  above  using  a  local  os- 
cillator). We  have  already  discussed  these  estimates  in  section  4.5.  For 
convenience  the  complex  representation  of  the  process  was  used  so  that 

t 
zt  =  tJinxt  =  'Jxt  eia  =  j xs  eisX g(t-s)ds, 

0 

where  g(u)  is  the  transient  response  function  of  *J  and  the  observation 
interval  is  {0,T).  The  average  power  is  then 

T  T      T       T 
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\zt\2dt  =  —   lj  xuxveHu~vng{t-u)g{t-v)dudvdt, 


000 
since  g  (u)  =  0  for  u  <  0.    Our  estimate  of  /  (X)  is 
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00         00 


a*=^  xuxvbT(u,v)dudv,  (1) 


o      o 

T 


where  bT  (u,  v)  ==  cos  (u  —  v)  X  J  g  (t  —  u)  g  (t  —  v)  dt, 

o 

and  as  this  function  is  zero  for  u>Torv>T  the  integrals  in  (1 )  are  extended 
over  the  square  with  sides  (0,  T). 

The  bias  and  variance  of  estimates  of  this  family  have  been  discussed  in 
some  detail  in  section  4.5.  Let  g(u)  =0  for  u  >h(T)  where  h(T)  =  o(T^). 
The  bias  is  then 
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e[~  J"n'<»-m)) 
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(p{x)\2[f(x)-f{X)]dx  +  ol      {    ' 
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T    ) 
On    the    other    hand,    it  is  clear  from  section  4.5  that  the  variance  of 
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is  given  by 


00 

sm    —  ( x  —  u} 
2n   '      '    ~lT (x - y)Y  1 9 {X)  |2  I y {y) ^ / (X)  f  (V) dx dy 


asymptotically.  This  suggests  that  under  reasonable  conditions  on  the  rate 
at  which  \(p{x)  |2  becomes  a  delta  function  at  A,  the  variance  is  asymptoti- 
cally the  same  as 

oo 


/2W)~  J"  I  ?(*)  !*«**. 


-oo 


The  reader  may  be  interested  by  another  estimation  method  proposed  by 
Griitzmacher  [1]. 

One  could  also  estimate  the  covariance  function  and  then  use  the  Fourier 
inversion  formula  to  get  an  estimate  of  the  spectral  density.  It  has  been 
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noted  in  the  engineering  literature  that  one  should  use  only  the  parts  of  the 
covariance  function  corresponding  to  moderate  lags,  as  the  rest  of  it  has 
a  large  relative  error  due  to  sampling  variability  (see  McCombie  [1]).  One 
would  then  be  led  to  the  continuous  parameter  analogues  of  the  spectro- 
graph estimates  that  were  analyzed  in  some  detail  in  Chapter  4.  Although  we 
were  concerned  with  discrete  parameter  processes  in  our  discussion  of 
spectrograph  estimates  in  Chapter  4,  it  is  clear  that  the  results  could  be 
extended  to  continuous  time  parameter  processes.  Such  an  extension  would 
be  more  natural  in  this  context. 

If  one  wants  greater  precision  in  the  computation,  a  digital  computer  will 
be  preferred  to  an  anlogue  computer.  In  such  a  case  the  data  will  be  discret- 
ized  so  that  it  can  be  fed  into  the  digital  computer.  If  the  original  time  series 
is  xt,  0  <  t  <  T,  the  result  of  discretizing  the  data  will  be  xTkjn,  k  =  0,  . . . ,  n. 

When  we  have  a  sample  of  n  observations  and  want  to  obtain  the  spectrum 
of  the  underlying  process,  there  are  two  types  of  limitations  that  occur. 
The  first  one  is  due  to  the  finite  sample  size  and  implies  that  one  cannot 
get  much  information  about  the  very  fine  structure,  i.e.,  one  cannot  dis- 
tinguish between  frequencies  very  close  to  each  other  because  of  the  finite 
resolvability  of  the  estimate.  The  other  limitation  is  on  detection  of  high 
frequencies  because  of  the  discrete  nature  of  the  sampling.  This  was  discussed 
in  section  I.9.D.  One  should  point  out  that  if  one  is  sampling  continuously, 
the  inertia  of  the  recording  instruments  will  introduce  errors  (usually  of  high 
frequency)  which  will  mask  the  spectral  density  in  the  high  frequency  range. 

The  reader  will  find  a  treatment  of  similar  questions  in  Davenport- 
Johnson -Middle  ton  [1]  and  Spetner  [1]. 

5.3.  Turbulence 

The  interest  in  problems  of  turbulence  has  grown  in  the  past  few  decades. 
In  the  study  of  turbulence  the  model  of  a  stationary  process  has  been  fruit- 
ful. Because  of  this  we  append  this  section  in  which  a  few  selected  and  simple 
topics  in  the  theory  of  turbulence  are  discussed.  The  first  question  that  arises 
is  the  nature  of  turbulence.  We  do  not  claim  to  give  any  definitive  answer. 
In  fact  there  still  is  some  disagreement  among  research  workers  as  to  what 
they  would  call  turbulence.  Nonetheless,  we  will  discuss  a  few  characteristics 
of  turbulence  that  would  be  agreed  upon  as  basic  by  most  research  workers. 

Perhaps  the  best  way  in  which  to  describe  turbulence  is  to  discuss  a  simple 
physical  context  in  which  it  arises.  Consider  a  regular  grid  of  bars  with  the 
spacing  between  bars  M  centimeters.  Suppose  that  there  is  a  uniform  flow  of 
a  fluid  with  speed  U  centimeters  per  second  perpendicular  to  the  grid  and 
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through  it.  Assume  that  the  kinematic  viscosity  v  of  the  fluid  is  small.  The 
Reynold's  number  of  the  grid  UM/v  then  is  large.  If  the  velocity  field  of 
the  flow,  after  it  has  passed  through  the  grid,  is  measured  repeatedly  under 
the  same  macroscopic  conditions  (the  boundary  conditions  are  held  fixed), 
it  is  noted  that  the  measurements  do  not  coincide  but  vary  substantially 
in  what  seems  to  be  a  random  manner  about  the  mean  velocity.  This  is 
what  happens  in  a  wind  tunnel  when  one  generates  turbulence.  The  random 
variation  cannot  be  explained  by  errors  in  measurement.  It  has  been  sug- 
gested that  the  random  character  of  the  velocity  field  is  due  to  the  instability 
of  the  flow.  As  a  deterministic  model  seems  inadequate  in  such  a  context,  it 
is  reasonable  to  look  for  an  appropriate  statistical  approach. 

Let  us  see  how  a  statistical  approach  is  suggested  by  deterministic  con- 
siderations. If  a  deterministic  approach  were  appropriate,  the  fluid  flow 
would  be  specified  by  the  equations  of  motion  and  the  boundary  conditions. 
Once  the  boundary  conditions  are  fixed  it  is  physically  plausible  that  the 
course  of  the  flow  would  be  completely  determined  by  the  equations  of 
motion.  Turbulence  may  be  due  to  the  solutions  of  the  equations  being 
unstable  so  that  a  small  perturbation  (which  one  cannot  detect  and  over 
which  one  has  no  control)  changes  the  solution  radically.  It  then  seems  rea- 
sonable to  consider  a  probability  distribution  of  velocity  fields  each  of  which 
satisfies  the  equations  of  motion.  This  is  in  fact  the  statistical  approach  used 
in  the  theory  of  turbulence. 

Assume  that  the  fluid  is  incompressible.  Let  u(x,t)  be  the  velocity  at 
point  x  at  time  t.  The  incompressibility  is  a  reasonable  assumption  if  the 

root  mean  square  velocity  yEu2  of  the  fluid  is  small  compared  to  the 
average  velocity  of  sound  in  the  fluid.  The  velocity  field  u(x,t)  then  satisfies 
the  continuity  equation 

i  =  1  v  xi 

Here  ut  and  Xi  are  the  *th  components  of  u  and  x  respectively.  Let  q  be  the 
density  of  the  fluid  and  p(x,t)  the  pressure  at  x  at  time  t.  The  velocity  field 
satisfies  to 

8u  1  „ 

— +  B'Vtt=  —  V  p  +  v  A   u, 

Ct  Q 

the  Navier-Stockes  equation.  Here  S7  u  is  the  gradient  of  u  and  \72u  is  the 
Laplacian  of  u.  The  term  u  •  V  u  is  often  referred  to  as  the  inertial  term. 
The  term  v\/2u  is  due  to  the  viscous  forces. 
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The  general  problem  of  turbulences-  very  complicated.  For  this  reason 
symmetry  conditions  that  simplify  the  problem  are  introduced.  These 
symmetry  conditions  lead  to  idealized  models  of  turbulence.  Nonetheless 
even  these  idealized  models  are  sometimes  approached  in  experimental 
situations.  Homogeneous  turbulence  is  such  a  model.  In  homogeneous 
turbulence  all  space  is  considered  turbulent  and  thestatistical  structure  of  the 
turbulence  is  assumed  invariant  under  space  translation.  This  means  that 
the  velocity  u(x,t)  is  a  stochastic  process  in  (x,t)  that  is  stationary  in  x, 
that  is,  {u(Xj  +  y,tj);  j  =  l,  ...,n}  has  the  same  probability  distribution 
as  {u(Xj,tj)\  j  =  1,  ...,  n} .  Though  homogenous  turbulence  is  an  idealization, 
it  can  be  realized  approximately  in  practice.  In  the  case  of  turbulence 
generated  by  a  flow  passing  through  a  grid,  the  turbulence  dies  off  slowly 
with  the  distance  from  the  grid.  Locally  this  turbulence  can  be  considered 
a  homogeneous  turbulence.  This  is  probably  the  case  in  measurements  of 
atmospheric  wind  velocities  if  one  restricts  oneself  to  a  sufficiently  small 
volume.  Homogeneous  turbulence  is  at  times  further  qualified  by  the  assump- 
tion that  the  probability  distribution  is  invariant  under  rotation  and  reflec- 
tion. By  this  we  mean  the  following.  Let  x\,  ...,  xn  be  any  n  points  and  alt 
...,  an  any  fixed  directions  at  these  points.  Consider  the  joint  probability 
distribution  of  the  components  of  the  velocities  at  xlt  . . . ,  xn  in  the  directions 
alt  . ..,  an  respectively.  The  distribution  is  assumed  to  be  invariant  under 
any  rigid  rotation  or  reflection  in  any  plane  of  the  configuration  formed 
by  the  n  points  and  the  associated  directional  vectors.  The  homogeneous 
turbulence  generated  by  flow  through  a  grid  seems  to  be  locally  isotropic 
a  small  distance  from  the  grid.  This  is  certainly  not  true  of  atmospheric 
wind  velocities  close  to  the  earth's  surface.  One  would  not  expect  isotropic 
turbulence  close  to  the  boundary  of  the  region  of  fluid  flow.  Isotropic 
turbulence  might  develop  at  some  height  above  the  earth's  surface. 

Intuitively  one  feels  that  a  turbulent  velocity  field  is  a  superposition  of 
different  modes  of  motion  and  one  way  to  study  turbulence  is  to  decompose 
it  into  these  different  modes  of  motion.  A  convenient  way  of  doing  this  is 
to  carry  out  a  Fourier  analysis  of  homogeneous  turbulence  with  respect 
to  x.  Some  people  feel  that  this  particular  decomposition  is  not  the  natural 
one  but  at  present  they  have  not  suggested  a  concrete  alternative.  For 
convenience  we  subtract  the  mean  velocity  field  from  the  turbulent  velocity 
field  so  that  Eu(x,t)  =  0.  Since  u(x,t)  is  a  process  stationary  in  the  space 
variable  x,  it  has  the  representation 

u(x,t)=  jeik-xdkz(k,t), 

K 
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where  z(k,t)  is  a  vector-valued  stochastic  process  that  is  an  orthogonal  set 
function  in  wave  number  space  K.  The  dependence  of  functions  on  t  will 
sometimes  be  omitted.  The  covariance  matrix 

R(r)  =  {Ri,(r)}=  j  eikr  f(k)dk, 

K 

where  f(k)  =  {fa(k)}  is  the  spectral  matrix.  Here 


E  dzi  {k)  d  Zj  (k')  =  ftj  (k)  d  k  dk,  k'- 
It  seems  appropriate  to  call  f(k)  the  energy  spectrum  matrix  since 

Bjj(r)  =  Eui(x)uj(x  +  r). 

Note  that  absolute  continuity  of  the  spectrum  has  been  assumed.  However, 
we  will  see  that  this  is  proper  since  the  nonlinear  or  inertial  term  in  the 
equations  of  motion  acts  as  a  transfer  term  and  would  immediately  spread 
the  energy  of  a  line  or  jump  in  the  spectrum  over  a  continuous  range  of 
wave  numbers.  The  velocity  field  u(x,t)  is  now  expressed  as  a  linear  superpo- 
sition of  Fourier  components  elkxdz(k,t)  and  we  identify  these  with  the  dif- 
ferent modes  of  motion.  These  Fourier  components  are  sometimes  referred 
to  as  "eddies".  Note  that  the  term  eddy  as  used  here  does  not  agree  with 
common  usage  as  it  is  in  fact  a  plane  wave.  The  study  of  turbulence  is  now  the 
study  of  these  Fourier  components.  Since  k  has  the  dimension  [length]-1, 
when  we  speak  of  small  eddies  we  mean  Fourier  components  corresponding 
to  large  \k\;  large  eddies  correspond  to  small  \k\. 

Average  ftj  (k)  over  all  directions  of  the  vector  argument  k  so  as  to  obtain 

fij(k)  =    j   fh{k)dk. 
|fc|=k 

Now  ipij(k)dk  is  the  contribution  to  EuiUj  from  wave  numbers  k  with 
k  <  |  &  |  <  k  +  dk.  The  functions  ipik  (k)  are  important  in  the  study  of  iso- 
tropic turbulence  because  the  dependence  on  k  is  determined  by  spherical 
symmetry.  The  function 

i£(k)  =  £2>«(k) 

i  =  l 

is  called  the  spectral  energy  function  because  it  is  the  contribution  to  the 
kinetic  energy  from  Fourier  components  with  wave  numbers  \k\  =  k.  The 
mean  kinetic  energy  per  unit  of  fluid  is 
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oo 

E\u\2=  j  E{k)dk. 


Let  us  see  what  we  can  infer  from  the  equations  of  motion  about  homo- 
geneous turbulence.  The  equation  of  continuity 

V-u=  jeikxik-dz(k)  =  0 

K 

implies  that 

k-dz(k)=0. 

Thus  dz(k)  is  orthogonal  to  k.  By  dz(k)  we  mean  the  value  of  the  spectral 
stochastic  set  function  z  on  a  small  interval  in  wave  number  space  centered 
at  the  point  k.  The  Navier-Stokes  equation  can  be  written  in  the  form 

du  1  o 

—  +  u-  V^  +  -  V Q  —  vV  u 
ct  P 


=  jeikxf-tdz(k)  +  i 

-    (  eikxdw(k)  +  v  [ 


ellcxdz(k)-k'elk  xdz(k')  + 


+  -      elkxdw{k)  +  v      kzelkxdz{k) 
=  0, 

where  p(x)/g  =  j  elkx  dw(k).    But  this  can  hold  if  and  only  if 

8 


dz{k)  +  i 
ct 


k'  'dz{k-k')dz{k')  +  ikdw{k)  +  vk2dz{k)  =  0.  (1) 


K' 


Note  that  z  (k),  w  (k)  are  the  integrated  Fourier  transforms  of  u  (x)  and  p  {x)/q 
(the  integrated  Fourier  transform  exists  but  the  Fourier  transform  does 
not).  If  the  divergence  of  the  Navier-Stokes  equation  is  taken,  the  following 
equation  is  obtained  relating  the  pressure  and  the  velocity  field 

1      2  ^   d2Ui  Uj 

-V2V=~  2  JT1T' 

Q  ij    CXiOXj 

On  taking  the  Fourier  transform  of  this  equation  we  obtain 


dw(k)=-^       k-dz{k-k')k'dz{k'). 


K' 
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An  integral  of  this  form  has  not  been  defined  previously.  However,  integrals 
of  such  a  form  can  be  defined  in  a  way  analogous  to  that  used  in  chapter  1 
if  higher  order  moments  of  u  are  assumed  to  exist'.  The  continuity  equation 
can  be  rewritten  in  the  form 

k-dzik-k')  =k'  -dz{k-k'). 


Using  this  and  the  representation  of  dw(k)  we  find  that  (1)  can  be  rewritten 
as 


ddzjk) 
8t 


=  i      k'  dz(k  —  k') 


K 


k 


-dz(k')  +  jtk-dz{k') 


vk2dz(k). 


It  follows  from  this  equation  that  the  rate  of  change  of  energy  associated 
with  dz(k)  is  given  by 


8  dzi  (k)  dzj(k) 
8t 


=  ij[k-dz{k-k')dzi  {k')  dzj{k)-k-  dz{k-  k')  dzt  (k)  dz,  (k') 


K' 


+ 


-Yi 


k 


t[k-dz{k-k')k-  dz{k')  kjdzt  (k) 


K' 


TJ 


-k'dz{k-k,)k'dz{k,)kidZj(k)]-2vk2dzi{k)dzJ(k).\ 

The  three  terms  on  the  right  give  the  effects  of  inertial,  pressure  and  viscous 
forces  respectively.  The  term  due  to  the  viscous  forces  is  linear  and  if  no 
other  forces  were  present  dz(k)  would  decrease  with  time  as  e~vkH .  This 
term  is  the  one  that  causes  the  dissipation  of  energy  in  turbulence.  Note 

that  dz(k)  =dz(  —  k)  since  u{x)  is  real.  It  follows  from  this  that 

8 


—       dzi{k)dzj(k) 


K 


is  not  affected  by  inertial  forces.  The  effect  of  the  inertial  forces  is  to  transfer 
energy  from  one  part  of  wave  number  space  to  another  without  changing 
the  total  amount  of  energy  associated  with  any  directional  component  of 
the  energy.  On  using  the  fact  that  k  •  dz(k)  =0,  the  contribution  to  d/dt 
(dz(k)  •  dz(k))  from  the  pressure  forces  is  seen  to  be  zero  so  that  the  effect 
of  pressure  forces  is  to  transfer  energy  from  one  directional  component  of 
dz(k)  to  another. 
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The  exchange  of  energy  between  the  different  modes  of  motion  (the 
different  Fourier  components)  and  the  dissipation  of  energy  are  basic  aspects 
of  turbulence.  Much  of  the  study  of  turbulence  today  is  centered  on  these 
two  important  aspects  that  are  not  understood  completely.  Experimental 
results  have  shown  that  u(x,t),  the  velocity  at  a  point,  is  normally  dis- 
tributed but  that  u(x,t)  Sbndu(x',  t),  the  velocities  at  two  distinct  points,  are 
not  jointly  normally  distributed.  Thus  u(x,t)  is  not  a  normal  process.  This  is 
not  surprising  since,  as  we  have  seen,  turbulence  is  a  nonlinear  phenomenon. 

One  can  see  from  equation  (1)  that 

oo 
0 

d    C  1^  .    .. .  ,,       d   1 


j\lfu 


dt   f       ^;il(k)dk  =  --Edz(k)>dz(k) 


=  -v      k2Edz{k)-dz{k)  =  -2v      k2#(k)dk. 

o 


oo 


The  change  of  energy  per  unit  mass  is  given  by  2v\\i2E(k)dk.  But  this 

o 

implies  that  the  small  eddies  lose  energy  by  viscous  dissipation  more  rapidly 
than  the  large  eddies.  Since  the  effect  of  inertial  forces  is  to  spread  energy 
over  the  range  of  wave  numbers  it  is  plausible  that  the  net  transfer  of  energy 
is  to  regions  of  low  energy  density  in  wave  number  space.  Since  viscous 
dissipation  is  more  rapid  at  large  wave  numbers,  the  inertial  transfer  of 
energy  is  from  small  wave  numbers  to  large  wave  numbers,  that  is,  from 
large  eddies  to  small  eddies.  A  similar  physical  argument  implies  that  the 
transfer  of  energy  by  pressure  forces  will  equalize  the  mean  squares  of 
the  directional  components.  But  then  it  distributes  energy  associated  with 
the  volume  element  dk  of  wave  number  space  uniformly  over  directions 
normal  to  k.  This  implies  that  there  would  be  a  tendency  towards  isotropy. 
If  we  now  think  in  terms  of  a  box  in  which  turbulence  is  set  up,  this  tendency 
towards  isotropy  would  manifest  itself  only  for  wave  numbers  appreciably 
larger  than  \/L  where  L  is  the  dimension  of  the  box. 

It  would  be  of  great  interest  to  find  out  something  about  the  form  of 
the  spectral  energy  function  E(k).  Consider  the  range  of  k 


K 
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Let  l/k0  be  of  the  same  magnitude  as  the  box  in  which  turbulence  has 
been  set  up.  Beyond  ks  are  the  small  eddies  where  only  viscosity  is  important 
and  we  should  like  to  get  some  idea  about  the  form  of  E(k)  in  the  range 
k0  <  k  <  ks.  Assume  that  the  turbulence  is  stationary  in  time.  This  is  a 
reasonable  assumption  if  energy  is  fed  in  at  the  boundaries  of  the  box  at  a 
rate  adequate  to  replace  the  energy  dissipated  in  the  box.  Let 


The  term  £k  is  the  rate  of  transfer  of  energy  from  wave  numbers  less  than  k 
to  wave  numbers  greater  than  k.  This  is  a  constant  since  the  turbulence  is 
stationary  in  time.  We  now  give  a  dimensional  argument  used  by  Kolmo- 
gorov  to  obtain  information  about  the  form  of  E(k).  Kolmogorov  makes 
two  basic  assumptions.  There  are  two  parameters  s,  the  constant  rate  of 
dissipation  of  energy,  and  the  viscosity  v.  The  first  assumption  is  that  statis- 
tical considerations  in  the  range  ks  >  k  >  k0  depend  only  on  these  two  para- 
meters. His  second  assumption  is  that  E(k)  is  independent  of  v  when  j>^0 
(the  Reynold's  number  then  approaches  infinity).  The  dimensional  argu- 
ment is  as  follows.  First  note  that 

=  [length] 
s  ' 

(ve)*  =  [velocity] 

E  (k)  =  [velocity]2  x  [length]. 

E(k)  =  (v5e)$®(k(-\  j 

where  0  is  a  universal  function.  Using  the  second  assumption  one  sees  that 
one  must  have 

so  that  E{k)  =  Cel3lk-l\ 

This  form  is  confirmed  by  experimental  data. 

An  extended  discussion  of  turbulence  is  to  be  found  in  G.  K.  Batchelor's 
book,  The  Theory  of  Homogeneous  Turbulence  [1]. 


But  then 


170 


5.4 


5.4.  Measuring  Turbulence  Spectra 


In  the  last  section  we  discussed  homogeneous  turbulence  carefully  avoid- 
ing all  difficult  or  technical  aspects.  It  should  be  clear  from  that  section 
that  the  spectrum  is  one  of  the  most  important  concepts  for  describing  tur- 
bulence, although  it  does  not  furnish  a  complete  description  due  to  the 
nonnormality  of  the  velocity  field. 

We  will  now  discuss  briefly  some  methods  of  actually  measuring  spectra. 
One  of  the  most  important  devices  for  this  purpose  is  the  hot  wire  anemo- 
meter. Denote  the  velocity  at  time  t  at  the  point  of  measurement  by  u(t). 
If  the  scale  of  the  turbulence  is  large  compared  to  the  length  of  the  wire  then 

M  ~^-  +  v(t)  =  cu(t), 
at 

where  v(t)  is  the  voltage  with  which  the  anemometer  responds  to  u(t). 
Here  M  is  the  time  constant  and  c  the  sensitivity  of  the  anemometer.  (See 
H.  Liepmann,  [1].)  By  sending  v(t)  through  a  linear  circuit  with  the  fre- 
quency response  function  99(A),  on  measuring  the  mean  power  we  clearly 
estimate  the  quantity 

J   1  +  M 2  A2 
0 

If  cp  (X)  is  concentrated  around  a  frequency  A0,  we  estimate  approximately 


00 
0 


c2/(A0)   I  J^jfaiM 


There  are  several  things  that  should  be  observed  at  this  point.  We  have 
been  measuring  time  averages  instead  of  the  spatial  averages  that  were 
used  to  define  the  spectrum  of  the  turbulence.  However,  according  to 
Taylor's  hypothesis  the  velocity  field  of  the  turbulent  motion  is  carried 
past  a  fixed  point  without  any  material  change,  i.e.,  we  have  a  frozen  pattern 
of  turbulence  moving  with  the  mean  velocity  U  of  the  fluid.  If  we  are  willing 
to  accept  this  hypothesis,  observed  time  averages  can  be  converted  into 
space  averages  if  we  know  U. 

Another  complication  is  caused  by  the  fact  that  the  length  of  the  wire 
is  finite  so  that  what  we  measure  is  an  average  over  this  length.  We  shall 
briefly  study  how  this  affects  the  observed  spectrum  separately.  We  may 
want  to  measure  the  one  dimensional  spectral  density 
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fu(^i)  =jjfii{k)dk2dk3 

(see  section  5.3)  corresponding  to  the  stationary- process  ^(#,0,0).  A  hot 
wire  of  length  21  is  set  up  parallel  to  the  ar3-axis.  Assume  that  its  output  is 
proportional  to  the  integral  of  the  velocity  over  the  wire 

x3+i 
V  (xlt  x2,  a?3)  =  a   j   %  (xv  x2, 2/3)  dy3. 

x3-i 

Then  we  can  consider  the  measured  quantity  V  as  the  output  of  a  filter 
*J  with  the  frequency  response  function 


aJV-^^a811^ 


fc3 
so  that  (see  section  1.9. A)  the  spectral  density  of  the  V  process  is 

While  we  are  interested  in  the  spectral  density  jjfu(klt  k2,  k3)dk2  dk3,  we 
actually  estimate  instead  the  density  Jj4a2  [(sin2  k3l)/Jci]f11(k1,  k2,  k3)dk2dk3. 
It  is  clear  that  an  estimate  can  be  obtained  just  as  in  section  5.2.  For  a 
detailed  discussion  of  how  the  first  density  can  be  recovered  approximately 
from  the  second  see  M.  S.  Uberoi  and  Leslie  G.  Kovasznay  [1].  These  authors 
also  discuss  the  following  method  of  measuring  turbulence. 

Consider  a  fluid  with  a  fluctuating  density  q  (xlt  x2,  x3)  which  can  be 
thought  of  as  a  stationary  process.  We  study  the  density  fluctuations  in 
the  volume  bounded  by  two  planes  perpendicular  to  the  #3-axis  at  the 
points  x3  —  l  and  x3  + 1  respectively.  Light  is  sent  through  this  volume  and 
is  received  on  a  photographic  plate.  The  relative  fluctuations  of  the  intensity 
of  the  incident  light  are  described  by  a  function  h(x1,  x2,  xz)  given  by  (see 
Uberoi  and  Kovasznay  [1],  p.  387) 

x3  +  l 

h(x1,x2,x3)=   J    ^_  +  — jd^ 

x3-l 

so  that  it  is  the  result  of  q  put  through  a  filter  with  the  frequency 
response  function 


k3 
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If    the    spectral    density    of    o{x)    is    called    f(k1,  k2,  k3)    and    that    of 
h(x)     g(k1,k2,k3)  (see  section  1.9. A), 

sin    I  k 
g  (k±,  k2,  k3)  =  4  (k\  +  k\f  3  /  (kv  k2,  k3). 

Let  us  now  assume  that  q  (x)  is  an  isotropic  process  and  introduce  the  sur- 
face spectral  density  on  the  sphere  | k\  =  k,  tp(k)  =  4k2/ (klt  k2,  k3).  Then  the 
correlation  function  of  the  h(x)  process  is 


P(£)  =  Eh(x)h{x  +  £) 

I  I  I  (*?  +  *!)2-2 

n 


-  f  f  f  (U  +  Hl'^-'^re-"  ldk^dk,dk3. 


m     k2 


But  if  I  is  large  compared  to  the  scale  of  turbulence,  using  a  known  property 
of  the  Fejer  kernel  we  see  that  the  above  expression  is  approximately  equal 
to 


Set 


I  jj  (k\  +  k\)  xp  [(Jfef  +  Jg)*]  e~l ik> t>+k**> dkxd k2, 

£1  =  a  cos  99  \  kx  =  v  cos  6 

i2  =  a  sin  (p  J  k2  =  v  sin  6. 


00 


Then  0(a)  =  2Z  J  v>W/0(va)^. 

0 

The  quantity  we  want  is  ip{v).  However  we  can  estimate  j3  (a)  in  a  convenient 
way.  The  transparency  of  the  photographic  plate  is 

t(x)  =  d  +  ch(x), 

where  d  is  the  average  transparency,  so  that  if  two  identical  photographic 
plates  are  prepared  and  placed  parallel  to  each  other  with  a  translation 
£  =  (£l9  i2),  the  combined  transparency 

t2{x)  =  [d  +  ch{x)][d  +  ch{xJr  £)] 

and  hence  Et2{x)  =  d2  +  c2/3(£). 

After  estimating  the  left  member  in  the  usual  way,  one  could  use  the  Fourier  - 
Bessel  inversion  formula  to  estimate  the  quantity  we  desire,  y)(v).  However, 
it  should  be  remembered  that  the  periodogram  was  obtained  in  this  way 
from  the  Fourier  inversion  formula  and  was  not  a  very  good  estimate. 
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5.5.  Basic  Ideas  in  a  Statistical  Theory  of  Ocean  Waves 

Very  recently  Willard  J.  Pierson,  Jr.,  G.  Birkhoff,  and  J.  Kotik  have 
proposed  a  model  for  storm  generated  ocean  surface  waves  (see  Pierson  [1] 
and  Birkhoff -Kotik  [1]).  In  this  section  we  give  a  very  brief  outline  of  some 
of  the  basic  ideas  and  assumptions  on  which  this  model  is  based. 

Let  the  (:r,?/)-plane  coincide  with  the  free  ocean  surface  when  it  is  calm. 
The  disturbed  ocean  is  described  by  the  height  of  its  free  surface,  which  is 
a  function  r)(x,y,t)  of  position  and  time.  Consider  the  velocity  potential  O 
and  the  pressure  p.  The  density  q  of  the  water  is  considered  constant. 
Under  certain  assumptions  and  of  which  we  mention  one,  viz.  that  the  waves 
be  small,  the  motion  is  governed  by  the  linear  equations  (see  Pierson  [1] 
or  Lamb  [1]) 

AO  =  0 

Q 

together  with  the  appropriate  boundary  conditions  at  the  ocean  surface. 
One  should  note  that  this  is  a  linear  model.  At  the  free  surface  the  pressure 
is  zero  so  that  the  second  equation  becomes 


g 


One  possible  solution  for  the  surface  is  the  wave 


r\  (x,  y,  t)  =  cos 


jU 


19 
=  h(x,y,  t\/u,  6) 


(x  cos  d  +  y  sin  6)  —  /it  +  ip 


(i) 


of  amplitude  1 ,  phase  ip  with  wave  crests  infinitely  long  forming  an  angle  6 
with  the  positive  #-axis.  The  waves  propagate  in  the  direction  perpen- 
dicular to  the  wave  crests  with  a  speed  g/[x.  The  frequency  /u  is  the  frequency 
of  the  oscillation  at  a  fixed  point  on  the  ocean  surface.  The  corresponding 
potential  then  is 


O 


9 


/I2  z 


"—  c   Q 


sin 


H 


.9 


(x  cos  6  +  y  sin  6)  —  jut-iip 


and  the  pressure  a  distance  z  below  the  surface 


A*2  z 


p  =  gg  e  9    cos 


19 


(x  cos  0  +  y  sin  0)  —  jut  +  ip 


gQz. 
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To  get  a  realistic  description  of  the  sea  surface  Pierson  proposes  a  super- 
position of  waves  of  the  type  (1)  with  random  phases,  i.e., 


rj(x,y,t)=  cos 


oo  2 

—  (x  cos  6  +  y  sin  6)  —  /u  t  +  \p  {/u,  6) 


oo  Z 

n 


0       _Jr 

2 


/  (/*,  0)  Vdp  d  6,        (2) 


where  y)(ju, 6)  is  uniformly  distributed  on  —7i<yj<7i  and y>(/j, 0),  y (/*',  0') 
are  independent  if  (fxtd)=¥  (//,#')•  This  integral  should  be  interpreted  in 
the  same  way  as  the  integral  with  random  phases  in  section  1.5.  Note  that 
(2)  can  be  considered  as  the  usual  sort  of  spectral  representation  over  a 
two  dimensional  spectrum,  where  the  wave  number  X  =  (Al5  A2)  has  been 
transformed  to  polar  coordinates 

Xx  =  ii  cos  6 

A2  =  fi  sin0. 

The  angle  6  is  allowed  to  take  only  the  values  between  —  jt/2  and  n/2  since 
it  seems  rather  natural  for  a  storm  generated  wave  system  which  has  the 
main  direction  0=0. 

It  might  be  worthwhile  to  point  out  a  few  differences  from  the  model 
usually  used  in  the  theory  of  turbulence.  The  random  phase  representation 
used  in  this  model  of  the  sea  surface  implies  that  the  process  rj(x,y,t)  is 
a  normal  process.  The  random  integrals  used  in  the  model  of  turbulence 
need  not  be  normal.  In  fact,  it  has  already  been  noted  that  actual  measure- 
ments indicate  that  turbulence  is  not  a  normal  process  (see  Batchelor  [1]). 
It  is  clear  that  one  could  modify  this  model  of  the  sea  surface  and  use  the 
same  representation  as  is  used  in  the  theory  of  turbulence  so  as  to  avoid 
the  assumption  of  normality.  Another  difference  is  that  while  in  turbulence 
a  considerable  interest  is  devoted  to  the  isotropic  case  this  is  not  so  in  the 
present  context  as  is  seen  from  the  restriction  — n/2  <6  <n/2. 

The  expression  for  the  velocity  potential  is 


71 

oo         2 

0=1       I    sin 


n 


—  (x  cos  6  +  y  sin  6)—  /ut  +  ipiju,  d) 

-  <J 

0         _n 

2 

fl*  z 


ge° 


f(^6)VdfjL  dd 


and  we  get  the  pressure  in  a  similar  form. 
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Consider  the  average  potential  energy 


6  9    I    -.2 


rj  (x,y,t)dt 


at  some  point  (x,y)  over  the  time  period  0  <  t  <  T.  As  the  process  is  ergodic 
(see  section  1.7,  it  is  normal  with  a  continuous  spectrum)  it  follows  that  when 
T-^oo  the  average  potential  energy  converges  (say  in  the  mean)  to 


09 
2 


oo  2 

Erf{x,y,t)  =  Q-^-   [      \  f(p,e)d/idO. 


The  integral  of  /  (/u,  6)  over  the  half  plane  is  then  proportional  to  the  average 
potential  energy  at  any  point  in  the  plane.  The  form  of  the  spectral  density 
f(ju,0)  describes  how  the  potential  energy  is  distributed  over  various  frequencies 
ju  and  directions  6.  Let  most  of  the  spectral  energy  be  in  the  neighborhood 
of  ju0,  0o.  The  waves  will  have  fairly  long  crests  in  the  60  direction  and  the 
surface  will  look  fairly  smooth  if  ju0  is  small.  The  situation  will  resemble 
that  of  swell.  On  the  other  hand  high  frequency  components  will  make  the 
surface  rougher.  If  6  has  a  greater  range  of  variation  the  crests  will  be  shorter 
and  we  have  a  situation  typical  of  sea. 

As  rj(x,y,t)  is  normally  distributed  with  mean  zero  and  variance 


n 
oo         2 


j     ff(fi,6)   djudd, 


0      _n 
2 


we    know    its    distribution   completely   and   can   predict   wave    heights, 
consider  ship  motions  and  make  other  analyses  of  similar  nature. 

This  model  can  be  used  in  the  study  of  propagation  of  storm  generated 
ocean  waves.  We  will  briefly  outline  how  this  is  done  in  Pierson's  theory. 
Suppose  that  a  storm  has  been  blowing  over  a  time  interval  T  =  {0  <t  <  T} 
and  a  region  D  of  the  (x,y) -plane.  Let  us  assume  that  initially  there  are 
no  disturbances  of  the  ocean  surface  outside  the  region  T  x  D.  In  other 
words  we  consider  rj  as  locally  stationary  (over  T  x  D)  and  want  to  find 
out  what  happens  outside  this  region.  Consider  the  elemental  wave  h(x,y 
\fji,  6)  and  modify  it  by  multiplying  it  by  the  characteristic  function  c  (t,  x,  y) 
of  the  set  T  x  D 
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g{x,y,t\ju,6)  =c(t,x,y)h(x,y,t\ju,d). 

We  want  to  know  what  happens  at  some  point  p0  =  (x0,  y0)  at  some  future 
time  point  t0.  By  making  a  Fourier  analysis  of  g{x,y,t\ju,  0)  one  finds  that 
as  a  first  approximation  the  contribution  of  this  wave  to  the  point  p0  at 
time  t0  has  died  out  (at  least  approximately)  unless  (/u,  6)  is  in  some  set 
S(p0,t0).  For  (jii,0)€S(po,to)  the  corresponding  amplitude  of  the  wave  is 
not  changed  very  much.  One  can  get  more  precise  results.  However,  we  are 
only  sketching  the  idea  and  so  refer  to  Pierson,  Neumann  and  James  [1] 
for  a  thorough  treatment.  One  can  then  consider  the  waves  travelling  to 
the  point  p0  from  D  as  being  passed  through  a  filter  with  the  frequency 
response  function 

.       f  1     ^  (V,  6)sS(p0,t0) 
[0     otherwise. 

Hence  the  variable  r\  (x0 ,  y0 ,  t0)  is  normally  distributed  with  mean  zero 
and  variance 

jj   f{jLi,  d)d^dd. 

S(p0,t0) 

If  the  spectral  density  is  known  we  can  predict  the  average  properties  of 
the  waves  at  p0,  t0. 

If  S(p0,t0)  is  computed  in  a  few  simple  but  typical  cases,  it  is  seen  that 
when  we  go  further  away  from  the  region  D  of  wave  generation  a  more  and 
more  narrow  band  of  frequencies  is  singled  out.  This  means  that  the  waves 
behave  more  and  more  like  swell.  The  amplitudes  will  clearly  decrease  as 
p0  moves  away.  The  "periods"  of  the  swell  will  tend  to  decrease  from  high 
values  to  low  values  as  time  passes.  All  these  characteristics  have  been 
observed  (see  Pierson  [1]). 

5.6.  Other  Applications 

It  is  clear  that  as  soon  as  the  fluctuating  phenomena  we  study  are  of  a 
random  character  and  locally  homogeneous  the  stationary  processes  form 
an  adequate  class  of  models.  Except  for  comparably  few  cases  in  which 
one  knows  the  whole  structure  of  the  process  a  priori  except  for  a  small 
number  of  parameters,  it  will  be  necessary  to  use  a  nonparametric  approach 
like  that  outlined  in  the  previous  chapter. 

There  are  many  other  applications.  We  shall  make  only  a  few  more  re- 
marks on  such  applications.  For  example  when  we  have  a  filter  *$  and  want 
to  find  its  response  function  it  is  possible  to  put  pure  sine  waves  or  unit 
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impulses  into  it  and  measure  the  output.  This  is  not  always  the  best  procedure 
however.  Instead  one  might  use  some  sort  of  noise  of  known  spectrum  as 
the  input  and  measure  the  spectrum  of  the  output.  A  slightly  modified 
procedure  of  this  sort  has  been  suggested  for  finding  the  acoustic  properties 
of  a  hall  (see  H.  Thiede  [1]),  and  it  is  easy  to  think  of  other  cases  suited  to 
this  sort  of  analysis. 

Concerning  econometrics  and  the  social  sciences  it  should  be  pointed  out 
that  the  stationarity  assumption  will  often  be  valid  only  for  short  time 
intervals  in  general,  and  we  would  hesitate  to  apply  any  of  the  statistical 
techniques  of  this  book  in  such  a  context,  as  it  is  questionable  whether 
they  would  give  more  or  as  much  as  the  common  sense  methods  of  the 
practical  economist  or  social  scientist.  However,  in  a  context  where  there 
is  more  quantitative  theory  there  would  be  some  hope  of  finding  appropriate 
statistical  tools  for  the  analyses  of  such  time  series. 
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CHAPTER  6 


DISTRIBUTION  OF  SPECTRAL  ESTIMATES 


6.0.  Introduction 

In  Chapter  4  we  studied  the  problem  of  estimating  the  spectrum,  say  the 
spectral  distribution  function,  the  density  function  or  the  spectral  mass  in 
an  interval.  In  Chapter  5  such  estimates  were  applied  to  problems  arising  in 
various  fields  of  application.  Although  large  sample  expressions  for  the  bias 
and  variance  of  such  estimates  have  been  obtained,  it  is  clear  that  they  de- 
scribe only  crudely  the  random  variability  of  the  estimates.  For  a  more 
detailed  study  it  is  necessary  to  find  the  distributions  of  these  statistics  or 
at  least  get  approximations  for  these  distributions.  In  this  chapter,  on  the 
basis  of  such  results,  we  proceed  to  set  up  statistical  tests  of  hypotheses  or 
confidence  regions. 

6.1.  Preliminary  Remarks 

Many  statisticians  working  with  applications  will  agree  that  in  practice 
one  is  more  often  interested  in  confidence  statements  than  in  the  yes-no 
answer  of  a  test.  We  shall  therefore  concentrate  our  attention  on  the  con- 
struction of  confidence  regions  and  only  in  passing  mention  some  tests. 
The  confidence  regions  of  greatest  interest  are  those  forming  intervals  in 
"parameter"  space,  i.e.,  bands  of  functions,  of  the  type  say 


FX(X)<  F(X)<  F2(k),     -n<X<7i, 


where  FX(X)9  F2(X)  depend  upon  the  observed  sample  and  F(X)  is  the  true 
spectral  distribution  function. 

The  results  obtained  in  this  chapter  are  all  of  the  large  sample  variety, 
so  that  one  wonders  at  what  sample  size  the  errors  involved  in  using  these 
results  become  negligible  for  practical  purposes.  No  error  terms  are  derived 
and  at  present  it  does  not  seem  likely  that  one  can  obtain  error  terms  sharp 
enough  to  be  of  any  practical  use.  The  only  reasonable  alternative  seems  to 
be  to  consider  empirical  verification  by  an  artificial  sampling  procedure. 
So  far  only  a  small  scale  computational  program  has  been  carried  out  and 
we  will  present  some  spectral  analyses  of  artificially  generated  time  series 
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in  a  later  section.  They  are  intended  as  a  numerical  illustration  rather 
than  as  a  confirmation  of  the  validity  of  the  large  sample  approximation 
at  a  certain  sample  size. 

Certain  conditions  restrict  the  validity  of  the  results.  Some  conditions 
concern  the  regularity  of  the  spectrum,  so  that  the  spectral  density  is  not 
allowed  to  fluctuate  too  much  but  has  to  have  a  sufficiently  smooth  appear- 
ance. More  serious  than  these  are  the  restrictions  on  the  distribution  of  the 
observed  process.  We  deal  only  with  linear  processes  (see  section  1.6)  and 
although  this  is  a  general  class  of  processes  and  includes  for  example  the 
usual  models  employed  in  time  series  analysis,  it  is  still  very  far  from  the 
whole  class  of  stationary  processes.  It  should  also  be  remarked  that  while 
the  regularity  assumptions  concerning  the  spectrum  can  probably  be  weak- 
ened considerably,  the  same  is  likely  not  to  be  true  with  respect  to  the  distri- 
bution type  of  the  process.  In  fact,  the  crucial  point  in  the  derivations  that 
follow  is  where  we  use  the  independence  of  the  £„  generating  the  linear 
scheme  to  establish  joint  asymptotic  normality  of  certain  statistics.  For 
this  purpose  we  employ  the  central  limit  theorem  for  independent  variables. 
In  the  case  of  a  general  stationary  process  with  absolutely  continuous 
spectrum  the  £„'s  are  still  uncorrelated  (see  section  1.6)  but  not  necessarily 
independent.  Then  the  central  limit  theorem  will  in  general  not  hold  from 
the  formal  mathematical  point  of  view.  Countless  examples  of  this  could 
be  given  and  we  choose  the  following  simple  one  as  an  illustration.  Let 

£„  =  cos  vx2n,  (1) 

where  #  is  a  rectangular  stochastic  variable  in  the  interval  (0,1).  Consider 
the  sum 

"   .       ,  sin  (n  +  A)  x  2  tt 

>  P  =1  +  1 — • 

v=o  sin  \x2n 

Note  that  though  the  £'s  are  uncorrelated  so  that  the  sum  has  a  standard 
derivation  of  the  order  JV*,  we  have  not  normed  the  sum.  The  limiting  distri- 
bution is  that  of  the  stochastic  variable 

,  sin  tz  y 

1  4-  1  ■ —  J 

2  '     2 

sin  7i  x 

where  y  is  another  rectangular  stochastic  variable  in  the  interval  (0, 1 ) 
independent  of  x.  This  distribution  is  far  from  normal. 

The  reader  may  object  that  the  process  considered  above  is  not  strictly 
stationary.  We  now  give  an  example  of  a  strictly  stationary  process  with 
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orthogonal  random  variables  that  is  not  in  the  domain  of  the  central  limit 
theorem.  Let  the  joint  probability  density  of  gj,  gj+1,...,  gj+n  be  given  by 


V\%h  S/+I5    •••>  S;+n)        2 


(2  n) 


n/2 


I  fe=0  1  fc=0 


#1  (T2 


where    /=•••,     —  1,    0,    1,  ...,   w  =  0,   1,  ...,   0^  =t=  cr2    and    0^,  cr2>0.    The 
limiting  distribution  of    2  £k/vn  is 


/c=i 


/■  n 
V 


{  fc=l 


limPJ^=-    <a[^i[^Ka)  +  0((T2a)], 


a  mixture  of  two  normal  distributions.  The  process  {f  J  could  be  called  a 
symmetric  process  since  2?(^y,  £;-+1, ....,  l;-+n)  is  a  symmetric  function  of 
ijj)...,  £j+n.  An  interesting  discussion  of  such  symmetric  processes  is  to  be 
found  in  de  Finetti's  article  [1]. 

However,  the  experimentalist  would  argue  that  in  most  physically  realiz- 
able situations  where  a  stationary  process  has  been  observed  during  a  time  inter- 
val long  compared  to  time  lags  for  which  correlation  is  appreciable,  the  average 
of  the  sample  would  be  asymptotically  normally  distributed.  Intuitively  it  seems 
likely  that  a  strong  mixing  condition  (see  Hopf  [1]  and  Rosenblatt  [3]) 
would  insure  the  validity  of  the  central  limit  theorem.  Unfortunately  none 
of  the  extensions  of  the  central  limit  theorem  to  dependent  variables  seems 
to  answer  this  problem  in  terms  well  adapted  for  practical  interpretation. 

Let  us  return  for  a  moment  to  the  uncorrelated  stochastic  variables  (1). 
For  these  variables  the  central  limit  theorem  is  not  valid.  However  if  one 
considers  a  subsequence  £n,  1c  =  1 , 2 , . . .,  where  gaps  nk+1  —  nk  are  sufficiently 
large,  the  central  limit  theorem  will  hold.  In  particular,  it  is  enough  to  re- 
quire that  a  Hadamard  gap  condition  be  satisfied,  i.e., 

>  q  >  1 

nk 

from  some  value  of  k  on  (for  an  elegant  survey  of  this  and  related  results  see 
M.  Kac  [1]).  As  the  gaps  become  larger  with  increasing  k  the  variables 
behave  more  and  more  like  independent  stochastic  variables  and  this  is 
the  reason  why  the  central  limit  theorem  holds. 

Summing  up  we  feel  that  the  results  of  this  chapter  ought  to  be  valid  if 
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(a)  the  spectrum  is  not  too  peaked,  (b)  the  process  is  in  the  domain  of  the 
central  limit  theorem.  Sampling  experiments  should  be  of  great  help  in 
deciding  the  sample  size  and  type  of  process  for  which  the  results  have 
practical  value. 

Unfortunately  the  proofs  are  carried  out  only  for  a  discrete  parameter 
process  which  may  seem  a  serious  limitation  as  in  most  of  the  applications 
in  Chapter  5  the  process  has  a  continuous  parameter.  However,  it  seems 
likely  that  the  analogous  results  hold  for  continuous  time  although  the 
proofs  might  require  a  more  intricate  apparatus.  One  should  also  note  that 
in  situations  where  great  accuracy  in  determining  the  structure  of  the 
spectrum  is  required,  it  is  necessary  to  use  a  digital  computer.  In  that  case 
the  process  has  to  be  discretized.  The  analog  computers  could  utilize  all  the 
information  in  a  continuous  time  series  but  are  likely  to  have  larger 
computational  errors.  Which  procedure  one  should  choose  will  depend  upon 
the  required  accuracy  and  the  amount  of  time  one  is  willing  to  spend  on 
numerical  computations. 

6.2.  A  Heuristic  Derivation  of  a  Limit  Theorem 

Consider  the  estimate 


nw= 


IN{x)dx  =  —--£X  +  -   2 


2  n  N        n  v =■ i  N       v 
o 

of  the  spectral  distribution  function  F(X),  where 

N-v 


fc  =  l 


In    sections    6.3    to    6.5  we  shall  obtain  the  asymptotic  distribution  of 
expressions  like 

j/jVmax  \F*N(X)-F(k}\, 


0<A<7i 


N  \   \F*N(x)-F(x)\2dx 
6 

and  the  estimated  spectral  mass  in  several  nonoverlapping  A-intervals. 
These  derivations  will  be  carried  out  rigorously  and  are  valid  under  the 
assumption  that  the  stationary  process  xt  is  a  linear  scheme.  The  rigorous 
derivations  will  be  lengthy  and  it  may  be  difficult  at  times  to  see  the  basic 
ideas  under  the  formal  apparatus  used  to  push  the  proofs  through.  For 
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this  reason  we  preface  these  sections  with  a  heuristic  derivation  of  the 
asymptotic  distribution  of 

VN  max  \F*N(X)-F(X)\. 

0<a<ji 

These  preliminary  comments  will  emphasize  the  basic  ideas  of  all  these 
derivations. 

A  first  step  in  such  a  heuristic  derivation  (another  heuristic  derivation 
is  to  be  found  in  Grenander  and  Rosenblatt  [4])  is  to  note  that 


VN  j  IN{x)  =  \/NJ27ij(x)INi$(x)dx  +  o{l) 


(1) 


where  o(l)  is  uniform  in  X  and  IN^{x)  is  the  periodogram  computed 
for  the  process  £v  which  generates  the  observed  stochastic  process 
xt  =  ^at-v^v.    This  can  be  made  convincing  by  noting  that 


A 

VN  [  [IN(x)-27if(xjINtz(x)]dx=-= 

J  VN  r, 


00 


/,       dr  &S  Mrs 


s=  —  oo 


where 


N  i  (n-m)A i 

™r s  2-i      sn-r  Sstn-s  ^       ~,  T 

-n(m=i  l7ti{n-m) 

i{n-m)A  _  y 

2*  £n-r  Sm-s  ^       .  ,  T 

n  =  l  +  r.  N+r  Z7Zl(n  —  m) 

m  =  l  +  s,  N+s 


The  terms  drs  will  be  negligible  unless  |r|,  |  s\  are  large  and  then  the  coeffi- 
cients aras  will  be  small  since  2  \ar  |2  <  °°.  But  one  then  expects  the  differ- 
ence to  be  negligible  and  (1)  to  hold.  We  can  then  instead  consider  the  process 


Vn 


A 

\2nf(x 


)lNti(x)dx-F(l) 


A 

Vn  [^n 


/(*) 


In,s(x)- 


2n 


d 


x 


Vn 


{  f(x)dx  +  2  f-p=    f  f(x)  cos  vxdx,      (2) 

J  i  Vn  J 


where 


N-v 


+  v 


k  =  l 


If  the  spectral  density  is  well  behaved,  the  coefficients  J  f{x)  cos  vxdx  will 

o 

decrease  to  zero  as  r->oo  at  the  rate  0  (1/v).  We  can  then  get  a  good  approxi- 
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mation  to  the  distribution  of  the  process  ( 1 )  by  considering  a  fixed  but  large 
number  k  of  the  initial  terms  of  the  expansion  (2)  since  the  remainder  con- 
tributes little,  that  is,  we  should  consider 


Vn 


X  X 

f(x)dx  +  2  2  -j=       f(x)  cos  vxdx.  (3) 

J  i  VN  J 


0 


But  the  stochastic  variables  (c0  —  N)/  V  N, . . .,  ck/  V  N  are  the  normed  sums 
of  a  &-step  dependent  stationary  vector  process  (that  is,  stochastic  variables 
more  than  k  steps  apart  are  independent)  with  mean  zero,  orthogonal  com- 
ponents, and  variances  e  +  2,1,1,...,  1.  But  the  central  limit  theorem  holds 
for  &-step  dependent  stationary  processes  so  that  the  stochastic  variables 

(c0  —  N)/  v  N, . . .,  ck/  V  N  are  asymptotically  normal  and  independent  with 
mean  zero  and  variances  e  +  2, 1, . . .,  1,  as  iV^oo.  The  truncated  process  (3) 
should  asymptotically  have  the  same  distribution  as 


V e  +  2  y0   \   f(x)dx  +  2^?yv       / (x)  cos  vxdx 
J  i        J 


0  0 


and    on    letting    &->oo    we    see    that  the  process  should  asymptotically 
have  the  same  distribution  as 


Ye  +  2 y0      f(x)dx  +  2^yi,      f  (x)  cos  vxdx,  (4) 


where  y0,yi,---  are  normal,  independent  variables  with  mean  zero  and  vari- 
ance one.  On  the  other  hand  the  stochastic  model  of  linear  Brownian  motion 
can  be  written  as 

CO  •  T 

B(X)  =  y0X  +  V2  ly,^1--,     0<X<7t 

v=l  V 

(see  section  1.1  of  Chapter  1).  The  process  (1)  can  then  formally  be 
written  as 

x  x 

fey      f(x)dx  +  V27i  j  f{x)dB{x)  (5) 

0  0 

if  e  >  0,  where  B  (x)  is  the  linear  Brownian  motion  and  y  is  a  normal  variable 
of  mean  zero  and  variance  one  independent  of  B  (x).  The  term  on  the  right 
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in  formula  (5)  is  not  well-defined  sample-functionwise  but  we  really  mean 
by  it  the  stochastic  process  obtained  from  it  by  integrating  by  parts.  The 
resultant  expression  will  then  be  well-defined  sample-functionwise.  Intui- 
tively it  is  clear  that  the  limiting  probability 

lim  P  [YN  max  |  F*N (A)  - F (A)  |  < -a) 


=  P  I  max 

0<A<ti 


Ye  +  2     / (x) dx +  2  2  Vv      / (x)  cos  vxdx 


2/o 

J  i        J 

0  0 


< 


OL 


and  that  it  is  equal  to 

fey  \  f(x)dx  +  Y2n     f(x)dB(x) 


P  I  max 

0<A<Tt 


H 


when  e  >  0.  Note  that  (4)  is  a  normal  process  with  mean  zero  and 
covariance 

min  (A,  /<) 

e^a)i^(//)  +  27r      J      f(x)dx. 

o 

The  case  e  =  0  is  of  particular  practical  interest  because  the  limiting  distri- 
bution can  then  be  computed  with  ease.  In  particular,  e  =  0  when  xt  is  a 
normal  process. 

The  heuristic  derivation  given  above  is  an  abbreviated  version  of  the 
rigorous  proof  to  be  given  in  the  following  sections.  There  are  several 
places  where  the  argument  is  accompanied  by  a  rather  convincing  waving 
of  the  hands.  These  are  the  gaps  that  are  filled  in  and  reasoned  out  in  detail 
in  a  strict  proof.  A  few  of  these  gaps  are  worth  noting.  One  of  these  occurred 

when  Y N[Fn(A.)  —  F(X)]  was  approximated  by  a  finite  number  of  terms  in 
its  expansion  in  terms  of  the  product  lag  moments  of  the  £  process.  Another 
gap  is  to  be  found  in  the  assertion  that  the  limiting  probability  distribution 

of  the  maximum  of  Y  N\  F%(X)  —  F(X)  \  is  the  same  as  the  probability 
distribution  of  the  maximum  of  the  absolute  value  of  the  process  which 

is  the  limit  of  Y N[F%(X)  ~  F{X)]  distributionwise. 

6.3.  Preliminary  Considerations 

As  remarked  before,  one  of  the  primary  objects  of  the  following  sections 
is  to  investigate  the  asymptotic  distribution  of 
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maxVN\[F*N(X)-F{Xy\ 

0<A<71 

when  the  observed  process  xt  is  a  linear  scheme,  that  is, 

*t  =  2  at-v£v,        2«v<00, 

where  the  £„  are  independent,  identically  distributed  stochastic  variables 
with  mean  zero  and  variance  one.  The  basic  idea  of  the  derivation  is  to  first 
treat  the  special  case  of  the  process  xt  =  gt  and  then  to  reduce  the  general 
case  of  a  linear  scheme  to  this  special  case.  Various  auxiliary  assumptions 
will  be  introduced  occasionally  and  they  will  be  carefully  indicated  when  they 
appear.  However,  one  does  feel  that  some  of  these  auxiliary  assumptions 
are  not  really  necessary. 

In  this  section,  a  few  results  required  in  the  main  body  of  the  proof  will 
be  demonstrated.  It  is  assumed  that  E  |£„  |8  <  oo  throughout  the  proof. 

N-\v\ 

Lemma  1.    Consider  the  covariances  cv=    2    £n£n  +  \v\-    If  v,  ^=*=0 

71  =  1 

(AXN      if  jll^v 
\Ecvcv+jc^c^j  <  (1) 

{  A2Nl     if  ju  =  v. 

8 

Proof.  It  is  clear  that  (1)  is  made  up  of  terms  E  \\  gn..  If  one  of  the  indices 

2  =  1  l 

nt  is  different  from  the  rest  this  term  vanishes.  Since  each  of  the  terms  is 
bounded  by  E  |  £„  |8,  it  is  enough  to  enumerate  the  nonvanishing  terms. 
But  we  have  restraints,  say 

n2  =  ni  +  a 
n±  =  n3  +  b 

n6=n5  +  c 

n8  =  n7  +  d, 

where  the  integers  a,  b,  c,  d  are  all  different  from  zero.  In  particular  one  can 
set  a=v,b=v  +  j,c=ju,  d=ju+j.  The  eight  variables  can  then  be  treated 
in  a  completely  symmetric  way.  Let  us  fix  nv  As  n2  =f=  nx,n2  has  to  be  equal 
to  some  other  nt,  say  n3. 

Now  we  separate  two  cases.  Since  nb  =N  nG,  it  has  to  be  equal  to  either  one 
of  7il5  n2,  n3,  %4  or  to  one  of  n7,  n8.  Consider  the  first  alternative.  Then  n7 
has  to  be  equal  to  one  of  n1}  n2,  n3,  w4,  n5,  n6  and  whichever  is  chosen,  we 
have 
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6.3 
ni  =  nx  +  au  i  =  2,3,...,  8.  (2) 

As  1  <  rii  <  N  this  gives  us  at  most  N  possibilities. 

In  treating  the  second  alternative,  assume  that  n5  =  n1.  If  any  of  the 
indices  nlt  n2,  nz,  n±  is  equal  to  any  of  the  indices  n5,  n6,  n1,  n8  we  have  again 
a  set  of  restrictions  of  the  type  (2)  and  hence  at  most  N  possibilities  again  so 
we  can  exclude  this  case.  But  then  n2  =  n3  =f=  nx  and  n6=¥n7  =  n5  so  that  the 
only  way  of  getting  a  non vanishing  term  is  by  setting  n2  =  w4,  n6=ns 
which  implies  that  a  =  b,  c  =  d.  If  that  is  the  case  we  have  at  most  N2 
possibilities.  The  result  clearly  follows. 

Note  that  the  distribution  of  the  stochastic  variables  (cQ  —  N)/V N,  c1/V N, 

. . ., ck/y  N  tends  to  the  distribution  of  k  +  1  normal  and  independent  variables 
with  mean  zero  and  variances  e  + 2,1,1,...,  1  respectively.  This  can  easily  be 
seen  in  the  following  way.  Introduce 

Zj  =  t0(g  -  1)  +  tjj  £,+1  +  •  •  •  +  tk£,  £j+k 

1*1 

so  that  UN  =  -j=  2  zi =  77=  Po  (co  ~  N)  +  h  c'i  +  *  * "  +  h  ck] 

VN  ;=l  VN 

N 

where  c'v  =  cv-Y      2      h  £j+v>     v>0. 

;'=  N-v  +  1 

It  is  clear  that  (c'v  —  cv)/VN^0  in  probability  as  N^oo.  But  Zj  is  a  station- 
ary (k  +  Independent  sequence  of  stochastic  variables.  Applying  a  theorem 
of  Hoeffding  and  Robins  [1]  we  see  that  UN  is  asymptotically  normal  with 
mean  zero  and  variance 

k 

i 
Thus 

^ei^  =  ^exp{^0^  +  ^i^^j->expj-i[^(e  +  2)+  |  $} 

and  the  statement  concerning  the  asymptotic  behavior  of 

c0-N      c1  cfc 

Vn  '  Vn'""  Vn 

is  demonstrated. 
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6.4.  Treatment  of  Pure  White  Noise 
We  now  discuss  the  limiting  probability  distribution  of 

X 


VN  max 

0<A<n 


n,iW 


2tz 


that  is,  we  discuss  the  probability  distribution  of  the  statistic  of  interest 
when  the  observed  process  xt  =  £t  is  pure  white  noise. 

Theorem  1.    Consider  the  empirical  spectral  distribution  function 

n,i(A)=  j  IN,z(l)dl, 


where 


In.zW 


1 


2jzN 


N 


I£,e 


-ivX 


v  =  l 


The  limiting  probability  distribution  of 


max  V N 

0<A<7T 


Fl.iW 


2n 


as  N-^oo  is  the  same  as  that  of 


max  |f  (A)  | 

0<A<n 

where  f  (X)  is  a  normally  distributed  process  with  mean  zero  and  covariance 


Proof.    Now 


Vn 


n.iW 


2tz 


X    /  c0      ,/--\      N^}    cv    sin  vX  ,„.  ,„, 

2tt\|/JV  /        i    ]/N     nv 


where  sN>  k  (X)  consists  of  the  first  term  and  the  k  —  1  first  summands  of  the 

sum.  We  shall  show  that  sNtk(X)  is  a  good  approximation  to  V N  [-F*,f  {X)  — 
X/2tz]  uniformly  in  X  when  k  is  fixed  but  large,  or  more  formally  that  with 
probability  close  to  one,  for  sufficiently  large  k,  \  rNik(X)  |  is  small  uniformly 
in  N,  X. 

We  now  carry  on  a  detailed  estimation  of  the  remainder  term  to  prove 
the  validity  of  this  approximation.  Consider 
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*     c„  sin  vX 


VN      v 


< 


where  0<m<l<N.    But 


l-m 
7=0 


j>  =  m 


l-i 

I 

v  =  m 


N  v  (v  +  ;) 


To  get  a  bound  for  this  sum  consider 


E 


i-j 

I 


cv  Cv+j 


v  =  m 


Nv(v  +  j) 
We  know  by  Lemma  1  that 


^-,  Ej  Cv  Cv  +  j  Cfi  Cfx  f  / 


(1) 


JJj  Cv  Cv±  j  CfxCu^j^  sz 


AXN2     if  v=/i 


A2N      if  v*[i 
so  that  expression  (1)  is  bounded  by 

1  A,    '.!  I 


l-i 


+i2   2 


l-i  a 

<  2 


-m   V    (v+jy        N  Vifi=m  V  (V +  j)  jbl  (ft  +  j)       v=mV2'{v  +  jY 

using  the  Schwarz  inequality.    Now  choose  m  =  2p  and  l  =  2p+1.    Again 
using  the  Schwarz  inequality  we  have 


E  max 

Q<X<n 


cv    sin  vX 


2p+i 

,-2P  VN  V 


<2? 
;=0 


2P+1-;  i 

z 


<^5    2    ^3^2  =  ^5/2 


P/2 


;=o 


Let  &  =  2n.    Then  with  probability  greater  than  1  — ^45/2 


P/4 


Tp  =  max 

0<A<^ 


2p  +  i 


2 


c„    sin  ?A 


,-2*  /#       r 


< 


)P/4 


Therefore 


[log  2V]+1  j 

|r«*a)|<    2    r^.ssji 


p  =  n 


with    probability    greater    than    1  —  ^46/2n/4.    By  choosing  n  sufficiently 
large,   |  rN>k  (A)  |  is  made  uniformly  small. 
We  want  to  find  the  limiting  value  of 


I  max  VN 

\  Q<X<n 


n.tW 


2n 


<a 
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From  what  has  just  been  shown,  we  see  that  for  any  e,  d  >  0 

A 


P  (  max  I  sN>k  (A)|<a-£-(5<P     max  \  N 


n,iW- 


271 


<<x 


<  P  (  max  |  sNik  (A)  |  <  a  +  e)  +  d 

Q<X<n 

for  k  sufficiently  large,  uniformly  in  N  >  N  (s,  6).    But 

c0-N        1    *     cv    sin  *>A 

*tf.fc(A)  = 7=^A  +  -     2   77=   

2^iV         *    i    Ktf      " 

and  the  joint  distribution  of  (c0  —  N)/vN,  t^/V N,  ...,  ckj\N(k fixed)  con- 
verges to  the  distribution  of  k  +  1  independent  normal  variables  with 
common  mean  zero  and  variances  e  +  2,  1,  1,  ...,  1.  Consider  the  related 
process 

sic  W  =  7T  ^e  +  2  *  +  2 ' 

2  i     7T  V 

where  the  y's  are  N(0,  1)  and  independent.    It  is  easily  seen  that 
lim  P  (  max  |  sNi  k  (A)  \  <  a)  =  P  (  max  |  sk  (A)  |  <  a) 

AT-*  oo  0<A<7i  0<A<;i 

as  the  relevant  point  set  in  (&+l)-space  is  closed.  On  letting  k  tend 
to  infinity,  sk  (A)  converges  uniformly  to 

In  i    n       v 

(see  Paley-Wiener  [1],  p.   148-151).    Then 

A 


P  (  max  |t(A)|<a-£)-d<P(  max  VN 

\    Q<X<71  /  \    0<A<7T 


n.fW 


27T 


<a 


<P(  max  |f  (A)|<a  +  e)  +  (5 


and  if  iV^oo  we  can  let  (5,e^0.  Since  the  distribution  function  of  max 

0<A<n 

f(A)|  is  continuous  (see  section  6.7),  this  completes  the  proof. 


6.5.  The  General  Theorem 

The  object  is  now  to  reduce  the  discussion  of  a  general  linear  process  xt  to 
the  case  of  pure  white  noise  discussed  in  the  previous  section.  This  reduction 
is  carried  out  in  two  steps.  The  first  step  is  to  note  that 
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j/^max    j[IN{l)-2jTf(l)INti(l)]dl\  =  o(l) 

0<X<n    q  ' 

under  certain  conditions.    Then  the  distribution  of 

maxVN\{  fQ)[2niN,t(l)-l]dl 

is  discussed  and  the  asymptotic  distribution  of 

|/tf  max  \F%(X)-F(X) 

0<A<7T 

is  finally  obtained. 

Theorem  2.    Let  a„  =  0(/), 

where  ($<  —3/2.    Then  given  any  £>0 

i 

0<A<7T 


j/iVmax    l[IN(l)-27zf{l)IN*(l)]dl 


<e 


with  probability  l  —  e  for  N>N(e). 

Proof.    Using  the  representation  of  xt  as  a  linear  scheme 

%t  =  2,  at-v  £v 
and  expanding  we  find  that 

27iVNJ[IN{l)-2nf(X)IN,t(l)]dl=—     2     arasdrs,  (1) 

0  VNr,s  =  -oo 

where 

N  i{n-m)X -i  i{n-m)l -i 


("r.s  2*      sn-rswj-s       •/  v  2*  €n-r  s 


m-s 


n,m=l  l{n  —  m)  n  =  l  +  r,N+r  I  (fl  —  m) 

m  =  l  +  s,  N+s 

The  summation  signs  are  primed  to  indicate  that  the  coefficients  of  the 
terms  with  n  =m  should  be  interpreted  as  X.  There  may  be  a  lattice  rec- 
tangle R(rNl  of  points  (n,m)  common  to  the  two  sums  that  add  up  to  dTtS. 
Call  the  complement  of  R{r^l  with  respect  to  the  set  consisting  of  all  the  lattice 
points  in  both  summations  C^l.  We  then  have 

Emsix\drs\<       2      9(n~m)> 

°^n  n,meCiNl 

r,  s 
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where 

One  can  verify  that 


.   ,— r         if    X=^0 

9(x)  =  {\*\ 

n         if  x  =  Q. 


2       g(n  —  m)  < 


rs 


2NlogN     if  \r\>N  or  |s|>JV 
14  (|  s  |  +  |  r  |)  log  N     otherwise. 


The  expectation  of  the  absolute  value  of  expression  (1)  is  then  bounded 
by  the  following  expression 

ty  oo  q  oo 

:t=    2     "2    \aras\2N\ogN  +  -==     2       2    \arasr\\ogN 

\Ns=-oo   \r\>N  YN  s  =  -oo  \r  \<N 


< 


2    \a.\)[4VNlogN    I    KI  +  ^tI^    2    k«r| 

K-/V        \r\<N  , 


\S=-00 


|r|>JV 


Under  the  assumptions  made,  this  expression  tends  to  zero  as  N^oo  which 
proves  the  theorem. 


Theorem  3.  Let  F{X)  be  a  nonnegative  absolutely  continuous  function.  Then 

A 

tiIn,s(1)  —  1}  dl 


lim  P  I  max  Vn 

N-*oo  I  0<A<7i 


j"/(Z){2; 


<  a !  =  P I  max  I  w  (A)  I  <  a  |  , 

\  0<A<;i  / 


' 


where  r\  (A)  is  the  normal  process  with 

Erj(X)rj(/jL)  =  eF(k)F(^)  +  27iO{rnm(X9/A)). 


Here 


&W=  jf(l)dl. 


Proof.    On  integrating  by  parts  we  find  that 


f(l){27iIN,e(l)-l}dl  =  2nf(X) 


F%,iW 


2n 


-JW 


(i) 


n.t(i) 


In 


dl. 
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Theorem   1   tell  us  that  for  any  e>0 


max 

0<A<:t 


Vn  ln,eW-7r-)-*».*M 


2n, 


<s 


with    probability    1  —  e    uniformly    in    N>  N (e)  for  k  sufficiently  large. 
But  then 


max 

0<A<:r 


fffl 


^(n.«a)-^) -•**(!) 


-Jrm 

o 


VN(Fi,e(l)--gA-8Ntk(I) 


dl 


<B1s 


with  probability  1  —  s.    However 


P  I  max 

I  0<2.<7i 


fWsN,kW-  jf'(l)sNtk(l)dl 


< 


a, 


■P  I  max 

I  0<A<7i 


j(X)sk(X)- j  r  (i)  Sk  (i)  di 


<a 


as   iV->oo  since  the  relevant  point  set  in  (&+l)-space  is  again  closed. 
We  know  that  for  any  £>0 

max  |  sk  (A)  —  f  (A)  |  <  £ 

0<A<71 

with  probability  1  —  e  for  sufficiently  large  k.    Let 


ijW)-/w)ca)-j/'(ocw«". 

0 


Then 


P  J  max  |  tj  (X)  |  <  a  —  £  J  —  d 

[  0<A<ti  J 


<  P    max  j/iV 

I    0<A<7T 


ff(l){2nIN,t(l)-l}dl 


< 


a. 


<P     max  |w(A)|<a  +  e   +($ 

I    0<A<7T  J 
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an  on  letting  N^oo  so  that  e,d^0  we  obtain  the  desired  result.  Note  that 


the  event  max  |  r\  (A)  |  <  a  has  a  well  defined  probability  since  rj  (A)  is  a  process 

0<X<71 

with  continuous  sample  functions  with  probability  one. 

We  now  put  the  pieces  together  and  state  the  following  general  theorem: 

Theorem  4.    Let 

1 .  /  (A)  be  absolutely  continuous. 

2.  av  =  0(A  P<  -3/2. 
Then 


lim  P  {  max  |/JV  |i^(A)-P(A)|<a|  =P{  max  |n(A)|<aj- 


(2) 


This  general  theorem  follows  immediately  from  theorems  3  and  4.  Condi- 
tion 2  of  this  theorem  is  a  little  disquieting.  We  know  that  a  (A)  =  ^2_4aie~xiX 
is  some  square  root  of  2  nf  (A),  that  is,  I  a  (A)  |2  =  2  tt/  (A).  It  is  most  reasonable 


to  take  a  (A)  as  a  (A)  =  J/  2ti/(A)  since  one  could  then  relate  the  asymptotic 
behavior  of  the  ar's  to  the  regularity  properties  of  /(A).  This  can  be  done 
in  the  case  of  a  normal  process  since  it  makes  no  difference  there  which 
square  root  of  2nf(X)  is  taken.  However,  in  the  case  of  a  general  linear 
process  it  will  make  a  difference.  In  general  there  will  be  only  one  square 
root  of  2nf(X)  which  corresponds  to  the  representation  of  xt  as  a  linear 
process.  However,  as  noted  in  the  section  6.1,  it  seems  intuitively  plausible 
that  this  result  ought  to  be  valid  if  the  spectrum  is  regular  (condition  1  of 
theorem  4)  and  if  the  observed  process  xt  satisfies  the  central  limit  theorem. 
Since  essentially  any  moving  average  of  a  stationary  process  satisfying  the 
central  limit  theorem  will  also  satisfy  the  central  limit  theorem,  it  seems 
likely  that  for  such  processes  it  will  not  make  any  difference  which  square 
root  of  2  7if{X)  is  chosen.  In  particular  one  would  expect  to  be  able  to  take 

a  (A)  =  V2jtf(X)  and  have  one  condition  specifying  the  regularity  of  the 
spectrum. 

It  is  of  interest  to  note  that  e  is  identifiable.  The  mean  value 


oo 
v=  —  oo 


so  that  sM .—  cov  (icf ,  xf+f*) =  «    2   a%-v<4+ti-v+  2rJ. 

v=  —oo 
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Then  e  = — -i  (3) 

r        31  no  V       / 


2  s,-±7i  j  f(X)dx 

—  71 

n 

j  fWdX 


1 n 


so  that  e  is  identifiable.  One  can  in  fact  construct  a  consistent  estimate 
for  e  using  (3).  However  this  estimate  will  not  converge  rapidly. 

6.6.  The  Normal  Case 

Theorem  4  is  of  special  interest  when  xt  is  a  normal  process.  The  following 
corollary  rephrases  the  content  of  the  theorem  in  this  special  context. 

Corollary  1.  When  xt  is  a  normal  process  with  a  positive  spectral 
density  having  an  integrable  second  dervative,  Theorem  4  reduces  to  the 
following  statement 


limpf  max  Vn\F*n(A)  -  F  {X)  I  <  a] 


=    2     (-l)k[®{{2k+l)oL/x)-d){(2k-l)oL/x)]  =  A(-), 

where  <D  (u)  is  the  normal  distribution  function  and 

x  =  V2nG(jz). 

Proof.  If  the  spectral  density  of  xt  is  positive  and  has  an  integrable  second 
derivative  conditions  1  and  2  of  Theorem  4  are  both  satisfied.  One  can  easily 

see  that  the  second  condition  is  satisfied  on  choosing  a{X)  =  K 2 tt/(A).  The 
parameter  e  =  0  since  xt  is  a  normal  process.  But  then  r\  {X)  reduces  to  the 
model  of  linear  Brownian  motion  discussed  in  section  1.1  with  the  following 
changed  scale  of  time 

t  =  27iG{k),  0<X<7t 
so  that  P     max  |^(A)|<a \=P{     max      |J3(*)|<a,  (1) 

{  0<X<n  J  {0<t<27iG(n)  j 

where  B(t)  is  the  linear  Brownian  notion.  The  reader  may  note  that  (1) 
is  the  probability  that  a  particle  in  Brownian  motion  on  the  line  is  not  ab- 
sorbed by  the  barriers  a,  —  a  in  the  time  interval  (0,  2  7iG{n))  given  that  the 
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particle  starts  from  0  at  time  0.  Consider  the  fundamental  solution  p(t,x) 
of  the  heat  equation  dp/dt  =  \d2p/dx2  which  is  zero  at  x  =  —  a,  a  and 
behaves  like  a  (5-function  at  x  =  0  when  t^O.  The  absorbing  barrier  proba- 
bility considered  above  is 

a 

J  p  (t,  x)  dx, 

—  a 

where  t  =  2jtG(ji).  This  absorbing  barrier  probability  can  be  elegantly 
obtained  by  the  method  of  images  (see  Sommerfeld  [1],  pp.  74-79).  Table  1 
is  a  table  of  the  percentiles  of  A  (a). 

Table  1. 
Table  of  percentiles  of  A  (a). 


A  (a) 

a 

A  (a) 

a 

.999 

3.4808 

.55 

1.2126 

.99 

2.8070 

.50 

1.1490 

.95 

2.2414 

.45 

1.0892 

.90 

1.9600 

.40 

1.0322 

.85 

1.7805 

.35 

.9774 

.80 

1.6449 

.30 

.9238 

.75 

1.5341 

.18524 

.8 

.70 

1.4395 

.10267 

.7 

.65 

1.3562 

.04136 

.6 

.60 

1.2812 

.00916 

.5 

The  corollary  of  Theorem  4  lends  itself  to  important  statistical  applica- 
tions. However,  we  first  have  to  estimate  G(ti)  which  is  in  general  unknown. 
Note  that 

1         oo 

4tt  _oo 

The  statistics  Cv/N  are  consistent  estimates  of  rv.  It  is  natural  to  consider 
an  estimate  of  the  form 

One  reason  for  this  particular  choice  is  that  in  practice  one  will  try  to  avoid 
the  cumbersome  calculation  of  all  the  lagged  product  sums  Cv. 
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Lemma  2.  If  av=  0{v^),  p  <  —  1,  the  statistic  G*(jz)  is  a  consistent  estimate 
of  G(n). 

Proof.  We  write  G*(ti)  as  the  sum  of  two  terms 

-«-Mf)'+^(f)K.t(§)'      « 

On  using  the  representation  of  xt  as  a  linear,  process,  the  expectation  of 
the  second  term  can  be  seen  to  be  less  than  or  equal  to 

[kNa]         N 

2  2       2  \\aj-planv-plak-pt  a,k+v-Ba\  +  |a;'-i8,  aJ+v-p,  ak-plO'k+v-p2\  + 

+  |  flty-ft  aj+v_pz  a,ic-pt  a/c+v-pt  \}  (4  +  fx4) 
=  (2i  +  22  +  23)(4  +  //4). 

oo 

Let  the  covariances  of  the  process     2    |an-v|£v  be  denoted  by  qv.  The 

V  =  —  00 

spectral   density  i|'2l°»lc     -|2  °^  ^his  process  is  continuous  and  there- 

oo 

fore   quadratically  integrable  so  that  2  ($  <  °°  ■    But  then 

—  oo 
[kNa]  oo 

so  that  ■==  I  2i  I  <  e 

for  m  sufficiently  large  as  N^oo.    We  also  see  that 

-i  i        [kNa]  N  i      [fctf«]         oo  jLOTi     1      « 

^2  I  Z2 1  =  ^2        2  2     £'"-*- 1\7     2  2      £n< jy >     ^>n->0 

-^■v  -*•*      v  =  m  + 1  /,  fc  =  1  ^vm  +  ln  =  -oo  ^v  -oo 


as  iV-^oo.  The  third  sum  23  can  be  handled  in  very  much  the  same  way  using 
the  Schwarz  inequality.  Choose  m  so  large  that  the  last  sum  in  (2)  is  less 
than  d  with  probability  larger  than  1  —  d  as  iV->oo.  The  first  sum  in  (2) 
consists  of  a  fixed  number  of  terms,  each  of  which  converges  in  probability 
to  the  corresponding  term  in  the  expression  for  G(n).  The  lemma  is  proved. 

Corollary  1  states  that  the  asymptotic  distribution  of  max  V  N  \  F%  (A)  — 

0<A<7T 

F(X)  |  depends  only  on  the  parameter  G(ji)  when  xt  is  a  normal  process  with 
a  regular  spectrum.  This  corollary  and  lemma  2  enable  us  to  construct  con- 
fidence bands  for  F(X). 
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Theorem  5.  Suppose  that  xt  is  a  normal  process  and  that  its  spectral  density 
is  positive  and  has  an  integrable  second  derivative.  Then 


nw-«^,J(l),fJa)+a^ 

is  an  asymptotic  confidence  band  of  the  spectral  distribution  function  F  {X) 
with  confidence  coefficient  A  (a). 

The  proof  follows  immediately  from  Corollary  1  and  Lemma  2  since 
A  (a)  is  continuous. 

Theorem  5  also  gives  a  test  of  significance  for  the  simple  hypothesis  of 
a  completely  specified  spectrum.  Another  interesting  case  however  is  the 
following. 

Theorem  6.  Consider  a  stochastic  process  xt  satisfying  the  assumptions  of 
Theorem  5.  Assume  that  two  independent  time  series  of  length  Nx  and  length 
N2  generated  by  this  process  are  observed  and  that  the  corresponding  estimates 
F\{X),  F%2{X),  G\{n),  Gt{7z)  are  computed.  Then  if  N±/N2->c>0, 

^". >a,       N  =  ^±, 

VGt  (n)  +  Ot  (n)  Ni  +  ^2 

is  asymptotically  a  critical  region  of  size  1  —  A  (a),  as  N->oo,  for  testing  the 
hypothesis  F1(X)  =  F2{X). 

The  proof  is  analogous  to  that  of  Theorem  5.  Both  these  theorems  are 
of  considerable  methodological  interest.  They  correspond  to  nonparametric 
statistical  techniques  since  the  parameter  space  of  the  class  of  admissible 
processes  is  the  space  of  regular  spectral  distribution  functions  (with  a 
distance  function  given  for  example  by  the  maximum  of  the  difference 
between  two  points  of  the  space).  But  this  space  is  properly  speaking  an 
infinite  dimensional  space  since  it  cannot  be  mapped  onto  a  finite  dimen- 
sional space  by  a  continuous  mapping. 


6.7.  Remarks  on  the  Nonnormal  Case 

In  the  case  e  4=  0  we  know  very  little.  However  a  little  bit  can  be  said  and 
it  is  worth  while  stating. 

First  of  all  there  is  an  inequality  for  absorbing  barrier  probablities  of  the 
type  we  are  considering  that  is  of  some  interest.  Let  tj^X),  f]2W  be  two 
normal  processes  with  moments 
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Fr]k  W^O 

min  (A,  /j.) 

E  r}k  (A)  rjk(pi)  =  ek  Fk  (A)  Fk  (fx)  +  2  n      j      j\  (I)  d  I, 

o 

k=l,2.  Assume  that  e1>e2>  —  2  (the  expression  for  Br\k  (X)  r\k  {fi)  will 
not  be  a  covariance  function  unless  ek>  —  2)  and  that  /x  (X)  >  /2  (A)  >  0 
for  all  X.    Of  course 

Fk(X)  =  $fk(l)dl. 

o 

If  the  functions  /fc(A)  satisfy  the  regularity  conditions  we  have  imposed  on 
the  spectral  density  in  the  previous  sections,  one  can  choose  representations 
of  the  processes  rjk(X)  that  have  continuous  sample  functions  with  proba- 
bility one.  The  absorbing  barrier  probabilities 

P{  max  \rjk  (A)|<a) 
are  then  well  defined.  Note  that 

EyAVyiM-EytW^M  (i) 

is  a  nonnegative  definite  function.  This  implies  that 

ih(A)=ih(A)+a(A) 

where  a(X)  is  a  normal  process  with  covariance  function  (1)  independent 
of  rj2W.  Intuitively  it  then  seems  reasonable  that 

P(  max  |^(A)|<a)<P(  max  |iya(A)|<a).  (2) 

This  inequality  is  valid  and  follows  from  a  result  of  T.  W.  Anderson  [2]. 
Thus  if  e  <  0  we  know  that 

P(  max  |w(A)|<a)>P(     max     \B(t)\<a)  =  A  (-)> 

0<A<7t  0<t<2^G(^)  \X/ 

where  x  =  y  2nG(n),  so  that  one  is  acting  conservatively  if  one  sets  up 
confidence  regions  based  on  the  limit  theorem  for  normal  xt. 

There  are  a  few  cases  in  which  the  limiting  probability  (6.5.2)  can  be 
explicitly  evaluated.  They  can  be  used  together  with  inequality  (2)  to 
get  bounds  on  the  absorbing  barrier  probability  of  interest. 

When  the  process  xt  is  pure  white  noise  (so  that  f{X)  =  constant,  say  1) 
the  limiting  probability  of  interest  is  given  by 
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P  {  max  |  f  (A)  |  <  a} 

0<A<ti 


(3) 


E  C  (A)  C  (/*)  =  6  A ^  +  2  7T  min  (A,  ^). 
and  can  be  explicitly  evaluated. 
Theorem  7. 


P{max  |C(A)|<a} 

0<*<n 


I  (-ire 

A;=  -oo 


fc2a2,      yv 


\    3i     \       r\> 


\   7T    \        y, 


where  y  =  e  +  2. 


Proof.  Let  r  =  A/tt.  We  first  introduce  the  following  notation.  Let  #  be 
a  normal  stochastic  variable  with  mean  zero  and  variance  one.  Let  x(r) 
be  a  normal  process  with 

Ex{r)=0 

Ex(t)x(t)  =min(r,T')  —  tt', 

that  is,  #(t)  is  the  linear  Brownian  motion  conditioned  so  that  x(l)  =  0. 
x  and  x(r)  are  assumed  to  be  independent  of  each  other.  Now  consider  the 
process 

u(r)  =  Vy  7i xr +  n  y2x(r). 
The  process  u  (t)  has  the  same  probability  distribution  as  f  (A)  so  that 


P{  max  |£(A)|<a}  =  P{  max  \u{t)\<ol}. 

0<A<tt  0<t<1 


But 


P  {  max  |  u  (t)  I  <  a} 

0<T<1 


=        P 


tx-l^yjirx  tx-fyjirx  1 

;= <x(r)  <  S= 5  0<T<  1  >  <P  {%)  dx 

71Y2  nV2  J 


(4) 


where  99  (#)  is  the  normal  frequency  function.    Let 

Y(t)  =  (t+l)x(~-^,     0<t<oo. 

Then    Y  (t)  h  the  linear  Brownian  motion  (see  J.  L.  Doob  [1]),  0<t<  00. 
Now  consider 
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P  {  -  a  -  (a  +  6)  t  <  y  (t)  <  a  +  {a  -  b)  t ;  0  <  t  <  oo},  (5) 

where  a  =  ol/tz  V2,  b  =  Vy  x/V2 ,  The  integrand  in  (4)  is  zero  unless 
\Vy x  < ol/tz.  Suppose  that  this  inequality  holds.  Doob  [1]  has  then 
evaluated  (5)  as 


oo 

1-  I{e 

m  =  l 


-2[(2m-l)2a2-(2m-l)&a]    ,      -2[(2m-l)2a2  +  (2m-l)ba] 


-2[(2  my  a2- 2m  b  a]  _  „-2[(2  m)!a!+2m6a]| 


=  1-2  2  (~l)n  +  1e 

n  =  l 


n  +  1     -2n2a2 


cos  h2nba. 


Then  (3)  is  equal  to 


O 


\7iVyJ         \7iVy 


nfy 


YyJ  n=l 


n  +  1g        n2 


VyoL 


w  (x)  cos  h n xdx 

71 


7i\/y 


,  /wKya        a  \       _  /  —  nVyoi        a  \       .  /-»rya        a  \) 


tz; 


Vy  a 


jiky' 


Z(-ire^H)L(b^L+n  _0 

n=-oo  L     \  n    \       yJJ 


1 

n 

n    \       y, 


Note  that  the  fourth  cumulant  e  >  —  2.  Let  the  limiting  distribution 
discussed  above  be  denoted  by  Ae(a).  In  the  previous  section,  this  limiting 
distribution  had  been  written  in  down  in  the  case  of  a  normal  process  xt,  that 
is,  A  (a)  =  A0  (a)  had  been  obtained.  Table  2  and  Graph  1  describe  the  de- 
pendence of  Ae(a)  on  e  and  a. 

Table  2. 
Table  of  a  for  fixed  e  and  Ae(a). 


e 

Ae'(o) 

-2 

-1 

0 

1 

3 

5 

10 

.900 

1.224 

1.603 

1.960 

2.271 

2.943 

3.375 

4.264 

.950 

1.358 

1.800 

2.241 

2.624 

3.396 

3.923 

5.001 

.990 

1.628 

2.194 

2.807 

3.333 

4.307 

5.019 

6.464 

.999 

1.949 

2.664 

3.481 

4.174 

5.390 

6.315 

8.183 
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6.8.  Spectral  Analysis  with  a  Regression  Present 

Theorems  5  and  6  are  somewhat  restricted  in  their  applicability  as  they 
stand  since  xt  is  a  linear  process  with  mean  value  zero.  In  many  contexts 
the  observed  process  yt  =  xt  +  mt  where  mt  =  Eyt  is  not  identically  zero. 
Consider  the  case  in  which 


(i) 


mt  =  d-L  (p\    + 


+  dp  <p\p\ 


where  qff't  tp\  , ...,  (ptp)  are  given  real-valued  sequences  and  the  regression 
coefficients  dv  are  unknown.  Let  d\,...,  d*  be  the  least  square  estimates  of 
dlf...,  dp.  It  then  seems  intuitively  plausible  that  Theorems  5  and  6  will 
still  hold  if  F%(X)  is  computed  using  yt  —  dfcpP  —  •••  —  dp(pf  in  place  of 
xt.  This  is  shown  to  be  true.  To  avoid  unnecessary  complications  we  confine 
ourselves  to  the  case  p  =  2,  which  will  illustrate  the  general  situation.  We 
have  to  introduce  the  following  condition  which  prevents  the  two  regression 
variables  qfp   and  cp{P  from  becoming  linearly  dependent  in  the  limit 


R=  lim 


'  N  \  2 


"~"*  2¥P  2vP 


<i. 


(i) 


Theorem  8.  Under  the  conditions  of  Theorems  5  and  6  and  (1),  these 
theorems  remain  valid  if  F%  (X)  is  computed  using  yt  —  dt  qfp  —  dt  (p(P  in 
place  of  xt. 

Proof.    It  is  sufficient  to  prove  that 


max  VN 

0<X<n 


1 


2tzN 


N 


2  (yt  ~  dt  cpT  -  dt  cpT)  e 
i 


dl 


1 


2tzN 


N 
1 


dl 


tends  to  zero  in  probability  as  N-^oo.  The  expression  inside  the  absolute 
value  sign  is 


1 


N 


2tzN  V(~i 


2    [~2(dt-  d)  <p?> x, -  2  {dt  -  d2)  <p? x,  +  (dt  -  dt)2  <pp  <pp  + 


+  (dt  -  d2f  <p<?  <pf  +  2  (dt  -  dx)  (dt  -  d2)  tp™  <pf] 


=  2i  +  22  +  23  +  24  +  2 


sin  (v  —  ju)  X 
v  —  fJL 


5' 
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where  the  usual  convention  is  used  for  terms  with  v  =  /u.  The  least  square 
estimates  d\,  d\  are  unbiased  linear  estimates  with  coefficients  depending 
only  upon  the  <pa),s  and  99(2)'s.  Now 


W 


2  yt 


#=^ 


AT 


AT 


<p?>  I  tfW?  -  <r?  I  <r?' 


T  =  l 


T=l 


V  =  l  /         t  =  l 


and    df    is    obtained    by    interchanging    ffi    and    (pf}  in  the  expression 
for  dt. 

A  simple  argument  using  the  fact  that  the  spectral  density  is  bounded 
shows  that 

var  di  <  2  n  max  /  (A)  ~    (1)2  v    ^     7v    ci)     (2)^ 

2  jr  max  /  (X) 


< 


Q<X<n 


1 


(2) 


l-B 


N 

1 


But 

^f/jVmax  |]y<  — 


d?  -  ^ 


Af-1    i 


AT-p 

.Z,     "^v  tyv  +  p 
v  =  l 


+ 


+  |d*-d1|:rc 


AT 


/ ,  •*'j>  (ftp 


(1) 


c=l 


p=-N+l  | 


1 


N 


2_,      %v  *fv  +  p 


v  =  l-p 


Moreover, 


E 


N-p 


2    Xv<P(v 


(1) 

+  P 


v=l 


N 


<2n  max  /  (A)  2  ^1)2- 

0<^<7l  V  =  l 


(3) 


From  (2)  we  know  that  with  as  large  a  probability  as  desired  \d%  —  dx 


N 


is    less    than  k/ 1/  2  9??)a  •    It  then  follows  from  the  Schwarz  inequality 

'    v=l 

and  (3)  that  with  large  probability 

jiVN  max  1 2i  \  <  V  log  N/VN-+0. 


0<A<n 


The  sum  ^2  can  De  handled  in  the  same  way.    Now 


N 


(1)  Jvl 


1 
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and  the  expectation  of  this  tends  to  zero  as  N~^.  24  and  25  can  be  treated  in 
a  similar  manner  and  this  proves  the  theorem. 

An  important  special  case  is  that  in  which  cpv .  =  1 .  This  corresponds  to 
an  unknown  mean  value  of  the  process.  Another  situation  of  some  interest 
arises  when  the  spectrum  of  the  process  has  a  discrete  component  with  fre- 
quencies Xlt  X2i  •  • .,  Ap.  We  then  take  the  g?„'s  as  trigonometric  functions  with 
these  frequencies. 


6.9.  Alternative  Estimates  of  the  Spectral  Distribution  Function 

It  is  of  great  interest  to  see  whether  Theorems  4-6,  8  are  still  valid  if 
the  estimate  FN(A)  is  replaced  by  a  truncated  estimate 

7?*      <7\-  C^    i     l     \n  sin vX 

In  N     7i N  „=i  v 

X  n  /  1\ 

sin  \hN-\--\{l-ju) 
In  (I) 73 did [i  (i) 

L        LI 

sin  — ~ 


where  hN-+oo  as  N^oo.  Here  the  weighting  factors  are  1  from  v  =  0  up 
to  v  —  hN  and  are  zero  from  that  point  on.  The  estimate  F%t  fr(A),  in  general, 
is  not  nondecreasing  with  probability  1  and  this  can  at  times  be  an  unpleas- 
ant feature.  One  might  be  interested  in  choosing  hN  =  [JcNa].  Such  estimates 
are  of  considerable  interest  because  they  reduce  the  computational  work  as 
one  only  needs  to  compute  the  Cv's  for  <  [JcNa]. 

We  shall  now  consider  a  class  of  estimates  of  the  spectral  distribu- 
tion function  with  nondecreasing  weight  functions.    Let 

WN{X)=  j  wN{l)dl, 

—  71 

where  wN  (I)  >  0  in  (  —  n,  ti) 

71 

J  wN  (l)dl=l  and 

—  71 

(0  if  A<0 


limWN(l) 

N-+oo  II     if    />0. 
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n 

Let  F%(X,  WN)  =  )lN(l)WN(X-l)dl 

o 

F(X,  WN)=  jf(l)WN(X-l)dl. 
o 

We  prove  the  following  theorem. 

Theorem  9.    Under  the  conditions  of  Theorem  4 

limP{  max  ]/N\F*N(X9  Wn)-F(X,  Wn)  \  <  a}  =  P  {  max  \rj{A)\<<x}. 

Proof.    On  integrating  by  parts  we  have 
VN[F*N(X,  Wn)-F(X,  Wn)] 

n 

=  Vn  j[iN(i)-f(i)]diwNa-7i)  + 

0 

71  I 

+  fVN  f[IN(f*)-f(/*)]df*wNa-l)dL  (2) 

o         o 

By  Theorem  2  we  can  replace  (2)  by 

n 

VN  ff(l)[2nIN,e(l)-l]dlWN(Xr-n)  + 

o 

+  j  ]/~N  j  [2  n  INi  e  (/*)  -  1]  /  (fi)  dfi  wN  (X  - 1)  d  h 


o         o 


committing    an    error  of  at  most  £  >  0  uniformly  in  X  with  probability 
1  —  e.    On  integrating  by  parts  twice  we  obtain 


VN  2  n  [F%  e  (A)  -  1/2]  /  fa)  JF„  (A  -  at)  - 


o 


n.«w-j 


-(HDWva-mdi.    (3) 


But 


/jVmaxin.fW-r^-  i«y,,(Jl)|<8 

with   probability    1  —  e  where  m  is  a  large  but  fixed  number.    We  can 
than  replace  (3)  by 
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n 

\i(i)WN(X-i)sNtm(i)di 

0 

=  ^^T]f(l)WN(X-l)dl  + 

2nV  N  o 

1    m  n    c 
+  -I  jw=  cos  vlf(l)  WN{X-l)dl 

H     1    q     r    -i-V 

with  an  error  of  at  most  e  uniformly  in  X.    Reasoning  as  in  Theorem  4 
we  have 


lim  P  J  max 


jf(i)wNa-i)s'N,m(i)di 


<a 


=  P  |  max 

lO<A<Ji 


\f{l)s'm{l)dl 


< 


a, 


making  use  of  the  fact  that 


max 

0<A<ji 


J  cos  vlf  (I)  WN{X-l)dl-  J  cos  vlf(l)dl 


<  max /(A)  (  f  PfiV(^rfZ+  f  |  WN(l)-l  \dl)-+0 

0<*<n  [_n  o  ' 


as  N^oo.    But 


max 

0<A<71 


)t(l)8m(l)dl-ri(X) 


<£ 


with  large  probability  if  m  is  sufficiently  large  and  the  theorem  follows 
immediately. 

One  is  usually  more  interested  in  estimating  F{X)  than  F(A,  WN).  The 
following  corollary  enables  us  to  do  this. 

Corollary  2.  Theorems  4-6,  8  remain  valid  when  F%(X)  is  replaced  by 

F*Na,wN)if 


j  WNWdX+)[l-WNW]dX  =  o(N'112). 


(4) 


Proof.    The  proof  follows  immediately  as 
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max  ]/~N\F(X)-F(X,  W N) 


< 


]/N  max /(A)f  J  W N (X)  dl+  ] [I  -  W N  (X)]dV(  =  o  (1). 

0<X<n  [_„  o  t 


It  is  clear  from  the  above  comments  that  the  time  series  analyst  has  a 
large  class  of  estimates  of  the  spectral  distribution  function  at  his 
disposal.  It  is  worth  while  noting  that  the  Fejer  kernel 

.  2  hN 

=  wN(A). 


Z  7i  fox      .   n  A 
s,n   - 


satisfies  (4)  if  log  hN/hN  =o  (N~$),  so  that  we  can  choose  the  truncation  point 
hN  as  hN  =  [KNa],  J  <  a  <  1.  The  estimate  corresponding  to  this  kernel  is 


hN 


nair,)-A+'2ft!^ (i-f  • 

This  estimate  of  the  spectral  distribution  function  is  closely  related  to  an 
estimate  of  the  spectral  density  given  by  Bartlett  (see  section  4.6).  It  is 
nondecreasing  and  does  not  require  the  computation  of  all  the  Cv's. 

Using  an  argument  analogous  to  that  given  above  one  can  easily  see  that 
Theorems  4-6,  8  are  still  valid  if  F%(A.)  is  replaced  by  F% ,trW>  where 
(log  hN)/hN  =o  (iV~*).  If  the  spectral  density  f(X)  is  continuous  and  analytic 
for  all  X  in  (  —  n,  n)  where  —  n  is  identified  with  tz,  Theorems  4-6,  8  are  valid 
with  F%(A)  replaced  by  F*Nttr{X)  when  hN  =  [kNa],  0<a<l. 


6.10.  Alternative  Statistics  and  the  Corresponding  Limit  Theorems 

In  some  cases  one  might  prefer  to  consider  statistics  other  than  max 

0<A<ti 

\  F%(%)  —  F(A,)\    in  dealing  with  the  spectral  distribution  function  F(X). 
We  shall  consider  some  alternative  statistics  in  this  section. 

Consider  the  linear  space  consisting  of  continuous  functions  c  =  c(X)  on 
0<  A<7r  with  the  norm  ||c||  =  sup  |c(A)|.  Let  q?(c)  be  a  functional  uni- 
formly continuous  in  this  topology. 

Theorem  10.  Under  the  conditions  of  Theorem  4, 

lim  P  {q>  Q/N  [F%  (A)  -  F  (A)])  <  a}  =  P  {cp  (rj  (A))  <  a}. 
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Proof.    Writing 

VN  [F*n  W  -  F  (A)]  =  sNt  k  (A)  +  rN>  k  (A) 

as  before,  we  note  that 

||r»*Wll<« 

with  probability  1  —  s  if  k  is  chosen  sufficiently  large.  We  therefore  commit 
only  a  small  error  by  considering  instead  the  probability  of  the  event 

{<P(*N.kW)^*} 

which  is  a  closed  set  in  (k  +  l)-space.  This  probability  converges  to 

P{<p(sM)<<x} 
as  iV->oo.  But  choose  k  so  large  that 

hM-yW\\<* 

with  probability  1  —  d.  This  together  with  the  uniform  continuity  of  (p  (c) 
proves  the  theorem. 

An  example  of  some  interest  is 

<p(c)  =  |J>(A)|^(A)J      ,     p>\ 

where  /u  (A)  is  bounded  and  nondecreasing.  This  will  give  a  statistic  of  the 
von  Mises  type. 

It  is  also  of  interest  to  consider  the  estimated  spectral  mass  in  several 

fixed  nonoverlapping  A-intervals.  Both  V  N[Fn{A)  —  F(X)]  and  ?y(A)  can  be 
extended  to  interval  functions  in  the  usual  sense,  that  is 

VN  [F*n  (I)  -  F  (I)]  =  VN  [F%  (A2)  -  F  (A,)]  -  \fN  [F%  (AJ  -  F  (AJ] 

where  /  —  (Al5  A2).  Then  it  is  clear  that  the  joint  distribution  of  yN[F%  (Ix)  — 
F(I1)],...,  v N[F%(Ik)  —  F(Ik)]  where  Ilf...f  Ik  are  nonoverlapping  inter- 
vals in  (0,  n)  is  asymptotically  that  of  ^(/J,...,  rj(Ik)  as  iV->oo. 

6.11.  Confidence  Band  for  the  Spectral  Density 

In  the  previous  sections  the  distribution  theory  associated  with  the  estima- 
tion of  the  spectral  distribution  function  F  (A)  was  discussed  in  some  detail. 
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In  particular,  in  Section  6.6,  confidence  bands  for  F  (X)  are  set  up,  when  the 
observed  process  xt  is  normal.  The  results  are  less  appealing  in  the  nonnormal 
case  because  they  depend  on  the  parameter  e.  The  computation  of  limiting 
probabilities  becomes  much  more  complicated  and  it  does  not  seem  as  if  they 
could  easily  be  put  into  a  form  that  is  nonparametric.  Moreover,  even  though 
e  can  be  estimated  consistently  we  have  not  been  able  to  find  an  estimate 
that  would  be  useful  in  practice  (see  section  6.5). 

In  this  section  we  shall  discuss  heuristically  the  setting  up  of  confidence 
bands  for  the  spectral  density  /(A).  It  is  significant  that  the  asymptotic 
distribution  associated  with  these  confidence  bands  does  not  depend  on  e 
and  can  be  given  a  nonparametric  form  for  the  class  of  linear  processes  with 
regular  spectra.  This  might  have  been  anticipated  since  we  have  already 
seen  in  section  4.2  that  the  limiting  covariance  of  spectrograph  estimates 
does  not  depend  on  e.  This  is  of  interest  from  an  applied  point  of  view 
because  it  means  that  estimation  of  the  spectral  density  can  be  carried  out 
within  bounds  without  worrying  about  e.  We  do  not  have  rigorous  proofs 
of  many  of  the  results  in  this  section.  However,  they  are  of  considerable 
interest  and  so  we  have  given  a  heuristic  derivation.  We  shall  indicate  con- 
ditions we  believe  to  be  intuitively  plausible.  However  the  reader  should 
take  these  with  the  customary  grain  of  salt.  In  this  discussion  consider 

(k  +  l)h 


l  ji*WM 


as  an  estimate  of  f({k-\-\)h)  and  confidence  bands  for  f(X)  about  this 
estimate.  It  will  also  be  clear  that  one  can  consider  truncated  estimates 
or  estimates  of  the  form 

(k  +  l)h    n 

-  IN{x)wN{x  —  X)  dx  dX, 

kh      —n 

where  wN(x)  is  a  weight  function  which  approaches  a  delta  function  at 
x^Oas  iV-^oo.  We  are  interested  in  these  estimates  as  N,  l/h^oo  simul- 
taneously but  where  l/h->oo  more  slowly  than  N.  However,  we  shall  only 
prove  a  limit  theorem  for  the  case  in  which  N^oo  first  and  then  l/h->oo. 
Of  course,  the  gap  in  the  argument  will  arise  when  we  claim  that  we  can 
then  let  N,l/h->'oo  simultaneously,  l/h=o  (N),  and  obtain  the  same 
limit  theorem.  This  is  the  case  of  practical  interest.  If  a  modified  estimate 
of  the  type  indicated  above  with  weight  function  wN(x)  is  used,  it  seems 
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intuitively  plausible  that  wN  (x)  should  approach  a  delta  function  at  x  =  0 
slower  than  (l/N)  and  faster  than  (h). 

We  shall  now  make  use  of  Theorem  4  in  the  plausibility  argument. 
Consider  the  stochastic  variables 


Zk  =  VN 


{k+l)h  (k+l)h 


j    IN(X)dX-     j    f(X)dX 


L    kh  kh 


where  h  =  n/m  and  k  =  0, 1, . . .,  m  —  1.  They  have  the  same  joint  distribution 
in  the  limit  iV->oo  as  the  stochastic  variables 

rj((k  +  l)h)-r]{kh),     &=0,1,...,  m-1, 
that  is,  they  are  normal  with  mean  zero  and  co variance 

(k+l)h  (j+l)h  (k+l)h 

e     j    f{X)dX    j    f(X)dk  +  2nd,k    j    f{X)dX. 

kh  jh  kh 

The  stochastic  variables  rj((k  +  l)h)  —rj(kh),  h  =0, 1,...,  m  —  1  almost  be- 
have like  independent  normal  variables  and  it  is  worth  while  seeing  how 
they  differ  from  independent  variables.  Consider  a  normal  variable  X, 
independent  of  the  process  r\  (X),  with  mean  zero  and  variance  1.  The  sto- 
chastic variables 

_(k+l)h 

XV2    j    f(X)dX  +  f]({Jc+l)h)-rj{kh),     k  =  0,  I,  ... ,  m-1        (1) 

kh 

are  normal  with  mean  zero  and  covariance 

(k+l)h  (j+l)h  (k+l)h 

(c  +  2)    /    f(X)dX    j     f(X)dX  +  27idJk     /    f2(X)dX. 

kh  jh  kh 

Let  Y,  Tlt...t  Ym  be  independent  normal  variables  with  mean  zero  and 
variance  1.  Then  the  stochastic  variables  (1)  have  the  same  probability 
structure  as 


(k+l)h  ,/        (k+l)h 

yj/e  +  2     J     f{X)dX+\2n    J     f(X)dXYki     k  =  0,  1,  ... ,  m- 1, 


kh  kh 


since  e  +  2  >  0.  If  all  these  stochastic  variables  are  appropriately  defined 
on  the  same  probability  space,  we  can  write 
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(k+l)h 


rj({k+l)h)-rj(kh)=Y27i    j     f(X)dXYk  + 


kh 


(k+l)h  _  (k+l)h 

+  ]/e  +  2     j    f{X)dXY-^2     /    f(X)dXX. 

kh  kh 


We  are  interested  in  the  following  limiting  probability 

n((h+l)h)-ri(h\)\<<S).1L_ 


(2) 


lim  P  [max 

h-+0        I      k 


l/         (k  +  1 


)h 


a 

m, 


f{X)dX 


(3) 


kh 


where  h  =  jz/m  and  <I>  (x)  is  the  normal  distribution  function.    Here  <I>  * 
denotes  the  inverse  function.    However  the  probability  of  the  set 


max 

k 


tj((k^  l)fr)-*7(fcft)l<(p.if  1 


(k  +  l)h 

2  n    j     f{X)dX 

kh 


m, 


given  any  fixed  pair  of  values  for  X  and  Y  converges  to 


lim  P  (max  I  7*  I  <0-1  ( 1 -  —  ) 

m^oo       [     k  \         m) . 


=  e 


-2a 


as  ra^oo.    This  is  clear  since  the  coefficient  of   Yk  in  (2)  is  essentially 

V 2 71  h  f  {kh)  while  the  coefficients  of  Y  and  X  are  essentially  Ve  +  2hf(kh), 

V2hf(kh)  respectively.    But  the  limiting  probability  (3)  is  then  e~2oc. 
We  have  thus  shown  that 


lim    lim  Pi  max 

m-*oo   JV-*oo       I       k 


]/N 


(k  +  l)h 


(k  +  l)h 


j    IN(X)dX-    j    f(X)dX 


kh 


kh 


(k  +  l)h 

2n     j     f{X)dX 

kh 


The  stochastic  variables 


VNlo 


r(k  +  l)h 


g 


a 


<o_1  1- 

\        m, 


j    IN{X)dX 


L    kh 


(k  +  l)h 

j    fWdA 

kh 


,     k  =  Q,  1,  ... ,  m—  1 
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have  the  same  distribution  when  iV-^oo   as 

7l{(h+l)h)-ri(kh)  • 

(k+m  '     «  — u,  i,  ... ,  ra     i. 

j  fWdx 

kh 

Using  an  argument  similar  to  that  given  above  one  obtains  the  following 
limit  theorem 

(  IWh  r  <*+!>*  /(k+l)h 

lim    lim  P  max]/—-   log        /     IN{X)dk         j     j{X)dX 
w->oo  jv^oo     I    k        In  V  kh  j      kh 

The  different  forms  of  the  limit  theorem  given  above  suggest  confidence 
bands  for  the  spectral  mass.  Perhaps  the  last  form  is  the  most  interesting 
because  the  logarithm  of  the  ratio  of  the  spectral  mass  and  its  estimate 
is  considered.  This  is  a  rather  natural  statistic. 

In  the  limit  theorem  given  above  we  first  let  iV-^oo  and  then  let  ra^oo. 
A  limit  theorem  of  much  greater  interest  from  the  applied  point  of  view 
is  that  in  which  both  N  and  m  increase  simultaneously.  It  is  clear  that  the 
limit  theorem  is  still  true  when  N  and  m  increase  simultaneously,  if  m  ->  oo 
sufficiently  slowly  with  respect  to  N.  A  simple  diagonalization  argument  will 
show  this.  The  question  of  greatest  interest  is  how  slowly  should  m-^oo  with 
respect  to  N. 

In  any  case,  as  noted  before,  one  must  at  least  have  m^oo  more  slowly 
than  N->oo  if  such  a  limit  theorem  is  to  hold.  Moreover,  if  one  were  to  deal 
with  an  estimate  of  the  form 

,  (k+l)h     n 

-       I         \  IN  (x)  WN  (x  —  X)  d  X  d  X, 

kh      —n 

where  wN(x)  approaches  a  delta  function  at  x  =  0  as  jV-^oo  and  expect 
such  a  limit  theorem  to  be  valid,  one  would  at  least  have 

ft/2  1IN 

J    wN  (x)  dx=  1  +  o  (1)     and        J    wN  (x)  dx  =  o  (1) 

-ft/2  -1/JV 

as  m,  AT->oo.  Note  that  if  one  could  let  m^oo  faster  than  N217,  the 
limit  theorem   (4)  could  be  put  in  the  following  form 
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P  (ma# 
I     Tc 


Nh 

2n 


log 


(k+l)h 


I     j    I„(X)dX/f 


This  is  a  rather  convenient  form  since  it  would  lead  to  the  following  asymp- 
totic confidence  band 


(k  +  l)n 


exp 


{-r^»-(-s))?/'-«^#+ 


1\^ 
2/  m. 


(fc  +  l)7T 


< 


»-(l/T*-'('-;))=/'-™'U 


fc7T 


for  the  spectral  density  f(X)  at  the  points  X  =  7z/2m,  3ji/2m, . . .,  ((2m  —  1)/ 
2m)ji.  Of  course  one  would  expect  to  get  the  same  type  of  confidence  band 
in  the  case  of  an  estimate 


(k+l)h     n 


h 


j       j  IN(x)wN  (x  —  X)dX 


kh      —7i 


where  wN(x)  is  a  weight  function  satisfying  conditions  of  the  type  spoken 
of  above.  Again  convenient  choices  for  ivN{x)  would  be  either  the  Dirichlet 
kernel 


sin  ([iV*]  +  j)* 


sm  f  # 


or  the  Fejer  kernel 


Wat  (#)  = 


27r[i^a] 


sin2  |  [iVa]  x 
sin2  J  # 


where  0  <  a  <  1  and  hNa ->oo. 

If  the  observed  process  has  a  nonzero  regression  of  the  form  given  in  sec- 
tion 6.8,  one  could  subtract  the  estimated  regression  from  the  observed  time 
series  as  one  did  in  section  6.8  and  compute  the  estimate  of  the  spectral 
density  from  the  resultant  series  of  numbers.  The  limit  theorem  for  the 
asymptotic  confidence  band  would  again  be  expected  to  be  of  the  same  form. 
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We  have  noted  that  the  limit  theorems  conjectured  and  discussed  here 
are  independent  of  e.  However,  it  is  clear  that  the  speed  of  convergence  to 
the  limiting  form  would  be  affected  by  the  magnitude  of  e. 

6.12.  Spectral  Analysis  of  Some  Artificially  Generated  Time  Series 

Ten  time  series  were  generated  for  us  in  Stockholm,  Sweden,  under  auspices 
of  the  Institute  of  Mathematical  Statistics  of  the  Stockholms  Hogskola. 
Before  commenting  on  the  results  of  the  spectral  analysis  of  the  time  series, 
we  will  describe  the  way  in  which  they  were  generated  and  discuss  the  rea- 
sons for  choosing  this  mode  of  generation. 

In  every  case  the  computation  started  with  the  choice  of  a  sequence  of 
random  numbers  £_4,£_3,£_2, . ..,  If,...,  £m-  These  random  numbers  should 
be  regarded  as  independent  observations  from  a  population  with  a  given 
probability  distribution.  There  are  three  probability  distributions  for  £  that 
were  chosen.  They  are 

N:  Normal  with  mean  zero  and  variance  one. 

R:  Rectangular  distribution  on  the  interval  0<  £<  100. 

T:  A  three-point  distribution 

(1   with  probability  1/10 
0     „  „  8/10 

-1    „  „         i/io. 

The  time  series  xt  is  computed  by  applying  a  moving  average 

Xt  =  ao£t  +  aiit-l  +  «2^-2  +  a3%t-3  +  ajt-i  +  a5%t-5 

to  the  sequence  £_4,£_3, ...,  £M  of  M  +  5  random  numbers.  There  are  two 
alternative  schemes  for  the  coefficients  of  the  moving  average  that  we  shall 
call  alternatives  A  and  B. 


Coefficients 

Alternatives 

«0 

ax 

a2 

a3 

«4 

a5 

A 
B 

0.1 
0.3 

0.5 
0.3 

0.2 
-0.2 

-0.1 
0.1 

0.2 
0.1 

0.1 
0.1 

Graphs  2  and  3  give  the  spectral  distribution  function  of  the  process  xt  in 
the  case  of  alternatives  A  and  B  respectively.  Graphs  4  and  5  give  the  form 
of  the  spectral  density  of  xt  for  the  alternatives  A  and  B.  The  values  of 
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the  spectral  distribution  function  and  spectral  density  respectively  for 
alternatives  A  and  B  at  A  =  (for/16),  h  =  1, . . .,  16  are  given  in  Tables  3  and  4. 
In  all  these  cases  the  variance  of  £t  is  assumed  to  be  one.  To  get  the  proper 
scale  one  must  multiply  the  values  by  g% 

Table  3. 
Table  of  values  of  the  spectral  distribution  function. 


Alternatives 

X 

A 

B 

TlllQ 

.030275 

.007409 

2tt/16 

.055277 

.012703 

3tt/16 

.072249 

.015088 

4tt/16 

.082063 

.015548 

5tt/16 

.088370 

.016161 

6^/16 

.095450 

.018814 

7  71/16 

.106045 

.024160 

8  7T/16 

.120346 

.031465 

9  7T/16 

.136423 

.039287 

IOtt/16 

.151591 

.046430 

Hjt/16 

.163748 

.052538 

.  12  7T/16 

.172063 

.058048 

13  7T/16 

.176871 

.063712 

14  7T/16 

.179136 

.070086 

15  71/16 

.179897 

.077282 

16  7T/16 

.180000 

.085000 

The  time  series  are  of  length  M  =  104  or  M  =  200.  The  following  notation 
has  been  adopted  for  the  time  series.  Time  series  NA2  is  the  time  series  of 
length  200  generated  from  normally  distributed  random  numbers  under 
alternative  A.  Time  series  i?Bl  is  the  time  series  of  length  104  generated 
from  rectangularly  distributed  random  numbers  under  alternative  B.  Time 
series  i?B3  has  the  same  properties  as  time  series  22B1,  that  is,  M  =  104, 
and  the  suffix  3  has  been  used  only  to  distinguish  it  from  22B1.  The  following 
time  series  have  been  generated  and  analyzed: 


.RA1 


i?Bl 


i^Al 


ATB1 


RA3        i?B3        NA2        NB2 


TAl 
TBI. 


We  have  noted  that  the  limiting  distribution  of  max  |  F%  (X)  —  F  (A)  |  / 

0<A<ti 

G*(7i)  (see  sections  6.5  and  6.6)  depends  on  e  and  the  spectral  density  f(X). 
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Table  4. 


Table  of  values  of  the  spectral  density. 


Alternatives 

A 

A 

B 

tt/16 

.144518 

.033755 

2  tt/16 

.107707 

.019452 

3  tt/16 

.065994 

.005749 

4  tt/16 

.037265 

.000819 

5  tt/16 

.030809 

.007125 

6  7r/16 

.043723 

.020461 

7  tt/16 

.064318 

.033306 

8  tt/16 

.079577 

.039789 

9  tt/16 

.081785 

.038813 

10  tt/16 

.070868 

.033651 

11  tt/16 

.052271 

.028981 

12  tt/16 

.032763 

.027829 

13  tt/16 

.017087 

.030357 

14  tt/16 

.006884 

.034661 

15  tt/16 

.001584 

.038364 

16  tt/16 

.000000 

.039789 

In  the  case  of  a  normal  process  xt,  e  =  0  and  there  is  no  dependence  on  f(X). 
Graph  1  suggests  that  in  general  the  dependence  on  e  is  much  more  impor- 
tant than  the  dependence  on  f(X).  It  is  clear  that  confidence  bands  for  the 
spectral  distribution  function  based  on  the  limit  theorem  in  the  normal  case 
are  conservative  if  e  <  0  and  optimistic  if  e  >  0  (see  section  6.7). 

The  suggested  limiting  distribution  of  max  I  log  (A  ( F*  (kh)/AF(kh))  I  (see 

k 

section  6.11)  is  independent  of  e  and  f{X).  Here  AF*(kh)  and  AF(kh)  are 
the  estimated  spectral  mass  and  true  spectral  mass  in  the  interval  (k  h, 
(k  +  l)h)  where  h=n/m  and  k  =  0,1,...,  m  —  1.  However,  for  a  finite 
sample  the  distribution  will  certainly  depend  on  e  and  f{X).  Within  bounds, 
it  seems  clear  that  the  dependence  of  the  distribution  on  /  (A)  will  be  much 
greater  than  the  dependence  on  e  since  AF*(kh)  is  a  local  estimate.  If  /(A) 
changes  very  rapidly  in  magnitude  over  a  given  X  range,  for  a  fixed  sample 
size  M ,  one  cannot  expect  AF*  (kh)  to  be  as  good  an  estimate  of  the  spectral 
mass  in  intervals  in  this  range  as  when  /(A)  has  slow  changes  in  magnitude. 
The  same  comment  is  valid  for  estimates  of  the  spectral  density.  We  shall 
discuss  an  interesting  and  relevant  suggestion  of  J.  W.  Tukey  in  section  8.8. 
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A  spectral  analysis  of  ten  time  series  cannot  give  much  information  about 
the  distribution  of  the  statistics  of  interest.  However,  it  may  give  some 
interesting  hints  about  the  dependence  of  the  distribution  of  the  statistics 
on  e  and  /  (A) .  In  order  to  get  some  idea  of  the  dependence  on  e  we  have  used 
three  types  of  random  numbers  in  the  generation  of  the  time  series.  In  the 
case  R  of  rectangularly  distributed  random  numbers,  e  =  —  6/5  <  0.  In 
the  case  N  of  normally  distributed  random  numbers  e  =  0  while  in  the  case 
T  of  a  three  point  distribution  e  =  2  >  0.  Alternatives  A  and  B  give  two 
different  spectral  densities  /(A).  The  spectral  density  of  alternative  A  is 
more  irregular  than  that  of  alternative  B.  The  decrease  of  the  spectral  den- 
sity of  alternative  A  to  zero  at  A  =  n  is  very  rapid. 

The  product  lag  moments 

M-v 

Cv—    z,  xtxt+v 

»»  =  0 

i>  =  0,  1,  ...,  15    were    computed    for  the  time  series  of  length  31=104. 
For  time  series  of  length  M  =  200,  Cv,  v  =  0,  1,  ...,  20  were  computed. 
The  estimate  F*  (A)  of  F  (A)  is  given  by 

when  M  =  104  and  by 

when  M  =  200.  Both  these  estimates  are  constructed  by  using  Fejer  kernels 
(see  section  6.9).  We  have  chosen  these  estimates  because  they  are  nonde- 
creasing  functions  of  A  of  a  simple  form.  The  estimates  F*  (X)  of  F  (A)  have  been 
computed  at  A  =  (kjt/W),  k  =  1,...,  16  for  each  of  the  time  series  and  are 
given  in  Table  5.  Our  estimate  of  G{n)  is 

15 


_,-        i  a     i  »  a2  /     v 


when  M  =  104  and 

i2  a        20 


n* ,    v       1     Co   .    1     v    Cv  /         v 


when  M  =  200.  The  values  of  G*  (n)  for  the  different  time  series  are  given  in 
Table  6. 
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Table  5. 
Table  of  values  of  F*  [X). 


X 

Series 

TBI 

TA\ 

2VB2 

iVBl 

2VA2 

iVAl 

RBI 

BBS 

RA3 

RAl 

71/16 

.00067 

.0049 

.0052 

.0049 

.0227 

.0143 

5.84 

1.74 

15.87 

27.85 

2tt/16 

.00118 

.0087 

.0103 

.0096 

.0447 

.0305 

11.56 

4.60 

37.44 

53.07 

3tt/16 

.00156 

.0117 

.0139 

.0121 

.0626 

.0457 

14.54 

6.99 

52.76 

69.52 

4tt/16 

.00177 

.0138 

.0154 

.0132 

.0730 

.0541 

15.61 

8.15 

61.78 

80.71 

5tt/16 

.00201 

.0156 

.0171 

.0143 

.0803 

.0600 

16.74 

9.06 

68.37 

88.93 

6tt/16 

.00247 

.0171 

.0205 

.0170 

.0897 

.0677 

18.80 

11.22 

72.51 

96.84 

7  tt/16 

.00310 

.0191 

.0270 

.0216 

.1057 

.0767 

22.22 

15.24 

77.36 

109.60 

8jt/16 

.00406 

.0219 

.0318 

.0264 

.1202 

.0885 

26.59 

20.69 

85.38 

124.00 

9tt/16 

.00550 

.0242 

.0380 

.0309 

.1338 

.1018 

31.41 

26.09 

94.04 

135.34 

10  tt/16 

.00679 

.0259 

.0468 

.0374 

.1503 

.1126 

37.65 

30.52 

103.40 

145.91 

Htt/16 

.00766 

.0277 

.0521 

.0446 

.1674 

.1222 

45.30 

34.68 

114.52 

154.72 

12  71/16 

.00847 

.0296 

.0571 

.0499 

.1773 

.1322 

52.13 

39.69 

122.49 

160.59 

13  7T/16 

.00928 

.0311 

.0621 

.0558 

.1822 

.1406 

58.11 

45.14 

126.93 

165.08 

14  7T/16 

.01008 

.0319 

.0670 

.0625 

.1844 

.1461 

64.00 

49.75 

130.00 

167.49 

15  7T/16 

.01078 

.0322 

.0752 

.0684 

.1860 

.1487 

69.69 

55.39 

131.86 

169.12 

16  7T/16 

.01130 

.0325 

.0879 

0761 

.1868 

.1499 

75.60 

63.68 

132.86 

170.32 

Table  6. 
Table  of  values  of  G*  {ji). 


Series 


TBI 
TA1 
iVB2 
iVBl 
2VA2 
iVAl 
.RBI 
.RB3 
i?A3 
RAl 


G*  (71) 


4.9748 
4.4685 
3.0757 
2.1746 
1.4994 
8.6001 
2.1404 
1.6017 
8.0661 
1.2293 


X10 

X10 

X10" 

X10" 

X10" 

X10" 

X103 

X  103 

X103 

X104 


-4 

-4 


The  true  and  estimated  spectral  distribution  functions  of  time  series 
jTAI,  NB1,  RA3  are  given  in  graphs  6  to  8.  Confidence  bands  about 
the  estimated  distribution  function,  computed  on  the  basis  of  the  limit 
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theorem  in  the  case  of  a  normal  process  xt,  have  been  set  up.  In  each 
case  the  narrowest  confidence  band  containing  the  true  distribution  function 
has  been  drawn  so  as  to  give  a  measure  of  the  deviation  between  the  true 
and  estimated  distribution  function.  Note  that  the  true  distribution  func- 
tions of  all  the  time  series  except  TBI  are  contained  in  confidence  bands  of 
confidence  coefficient  .90.  It  is  not  surprising  that  the  exceptional  time 
series  is  TBI  since  then  e  =  2  >  0. 

In  section  6.11  one  noted  that  in  estimating  the  spectral  mass  in  an  inter- 
val, the  length  of  the  interval  ought  to  be  larger  than  the  bandwidth  of  the 
weight  function  wm(x)  used.  In  this  case  it  then  seems  reasonable  to  estimate 
the  mass  in  intervals  of  length  n/S.  Here 


AF* 


is  an  estimate  of 


jt 


<*+.i>i-w*i 


AFlk-\=Fl(k+l) 


n 
8, 


F 


'kn 
~8~ 


k  =  0, 1, . . .,  7.  The  estimated  and  true  spectral  masses  are  given  in  Tables  7 
and  8  respectively.  In  the  case  of  all  the  time  series  computed  under  alterna- 
tive B  the  true  spectral  masses  are  contained  in  a  confidence  band  of  con- 
fidence coefficient  .90  about  the  estimated  spectral  masses.  However,  in 
the  case  of  the  time  series  computed  under  alternative  A  the  true  spectral 
masses  are  not  even  contained  in  a  confidence  band  of  confidence  coefficient 

Table  7. 

Table  of  the  estimated  spectral  mass  A  F*  in  the  intervals 
((*-!)  (tt/8'j,  k(7il$)),k=l,2,  ...,8. 


k 

Series 

TBI 

TAl 

A7B2 

iVBl 

iVA2 

NAl 

RBI 

RB3 

RAS 

RAl 

1 

.00118 

.0087 

.0103 

.0096 

.0447 

.0305 

11.36 

4.60 

37.44 

53.07 

2 

.00059 

.0051 

.0051 

.0036 

.0283 

.0236 

4.05 

3.55 

24.34 

27.64 

3 

.00070 

.0033 

.0051 

.0038 

.0167 

.0136 

3.19 

3.07 

10.73 

16.13 

4 

.00159 

.0048 

.0113 

.0094 

.0305 

.0208 

7.79 

9.47 

12.87 

27.16 

5 

.00273 

.0040 

.0150 

.0110 

.0301 

.0241 

11.06 

9.83 

18.02 

21.91 

6 

.00168 

.0037 

.0103 

.0125 

.0270 

.0196 

14.48 

9.17 

19.09 

14.68 

7 

.00161 

.0023 

.0099 

.0126 

.0071 

.0139 

11.87 

10.06 

7.51 

6.90 

8 

.00122 

.0006 

.0209 

.0136 

.0024 

.0038 

11.60 

13.93 

2.86 

2.83 
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Table  8. 

Table  of  the  true  spectral  mass  A  F  in  the  intervals 
((&- l)(:7r/8),  k(nlS)),  &=1,'2,  ...,  8. 


:' 


k 

Series 

TBI 

TA1 

2VB1&  JVB2 

iVAl  &  NA2 

RBI  &RBS 

RAl  &  RAS 

1 

.002541 

.011055 

.012703 

.055277 

10.5858 

46.0642 

2 

.000569 

.005358 

.002845 

.026786 

2.3709 

22.3217 

3 

.000653 

.002677 

.003266 

.013387 

2.7216 

11.1559 

4 

.002530 

.004979 

.012651 

.024896 

10.5426 

20.7466 

5 

.002993 

.006249 

.014965 

.031245 

12.4709 

26.0375 

6 

.002324 

.004095 

.011618 

.020572 

9.6816 

17.0600 

7 

.002407 

.001414 

.012038 

.007073 

10.0317 

5.8942 

8 

.002983 

.000173 

.014914 

.000864 

12.4283 

0.7200 

.99.  However,  this  is  not  too  surprising.  We  had  already  noted  that  the 
spectral  density  of  alternative  A  decreases  to  zero  at  X  =  n.  If  we  omit  con- 
sideration of  the  spectral  mass  in  interval  (Ijz/S,  n),  a  confidence  band  of 
confidence  coefficient  .90  will  contain  the  true  spectral  masses  in  every  case. 


9  - 

a 

8  - 


7 
6 
5K 


Ae(a)  =  .999 


Ae(a)  =  .990 

Ae(a)  =  .950 
Ae(a)  =  .900 


e  =  -2     0       2       4       6       8      10         12 

Graph  1.  Graph  of  level  curves  of  Ae  (a)  as  a  function  of  a  and  e. 
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F(X) 


.19 

.18 

.17 

.16 

.15 

.14 

.13 

.12  - 

.11  - 


.10 

.09 

.08 

.07 

.06 

.05 

.04 

/ 

.03 

/ 

.02 

-    / 

.01 

'iii 

I         1         1         I 

i         i         i        i 

i        i         i        i        i 

F{X) 


3 


I       5       6       7       8       ' 
16A/7T 
Graph  2.  Graph  of  J^  (A),  alternative  A. 


10     11      12     13      14     15     16 


7       8        9 

16A/7T 

Graph  3.  Graph  of  F  (X),  alternative  B. 


10     11      12     13      14     15     16 
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f(X) 


6       7        8 
16  A/71 
Graph  4.   Graph  of  /  (A),  alternative  A. 


f(X) 


7       8 

16  X(K 

Graph  5.  Graph  of  /  (A),  alternative  B. 


11      12     13      14     15     16 
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Graph  6.  True  and  estimated  spectral  distribution  function  of  series  TA1.  Confidence 
band  corresponding  to  A  (a)  =  .35  and  a=  .982. 
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.092 
.088 
.084 
.080 
.076 
.072 
.068 
.064 
.060 
.056 


Confidence  band 


Giaph  7.  True  and  estimated  spectral  distribution  function  of  series  iVBl.  Confidence 
band  corresponding  to  A  (a)  =  .173  and  oc=  .785. 
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Graph  8.  True  and  estimated  spectral  distribution  function  of  series  i?A3.  Confidence 
band  corresponding  to  A  (a)  =  .41  and  oe=  1.039. 
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CHAPTER  7 


PROBLEMS  IN  LINEAR  ESTIMATION 


7.0.  Preliminary  Discussion 

We  discuss  a  variety  of  problems  associated  with  regression  analysis  in 
this  chapter.  One  of  the  first  problems  that  comes  to  mind  is  that  of  distin- 
guishing between  two  regressions.  For  simplicity  let  us  consider  the  two  simple 
hypotheses 

Hj\     yt  =  xt  +  m{i\     7=0,1, 

where  xt  is  assumed  to  be  normally  distributed  with  mean  zero  and  spectral 
distribution  function  F(X).  The  two  regression  column  vectors  m(0)  =  (mi0); 
t  =  1,2,...,  n)  and  m(1)  =  {m(tl);  t  =  1,2,...,  n)  are  assumed  to  have  real 
components.  After  having  observed  y1,y2,--,  yn  we  may  want  to  test 
hypothesis  H0  against  hypothesis  H±.  Assume  that  the  covariance  matrix 
R  of  the  disturbance  xt  is  nonsingular.  Consider  the  two  frequency  functions 
corresponding  to  H0  and  H1 

/,(*/!,  2/2,  ...,</n)  = ^ expl-i^-m^yiT^-m^)},       j  =  0,  1. 

(2ji)*\R\112 

We  know  that  a  most  powerful  test  of  H0  against  H1  has  the  critical  region 
(see  Cramer  [3]) 

W  =  {?/*  R-1  (m(1)  -  m(0))  >  K) . 
We  shall  now  assume  the  special  form 

n 

m(/>=  /  eltxd<p,{X),     j=0,  I,  W 

—  71 

for  the  regression  sequences  where  (pQ(X),  (fiW  are  of  bounded  variation. 
Although  this  choice  of  mtp  includes  the  important  case  of  a  trigonometric 
regression,  it  is  somewhat  restricted.  We  shall  see  in  the  later  sections  how 
to  deal  with  regressions  of  a  more  general  form. 
Introduce  the  expected  values 

EJy*R-1(ma)  -ra(0))  =juif    j  =  0,1, 
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and  the  variance 

D2[y*R-1(ma)-mm)]=v 

which  is  the  same  for  both  hypotheses.  Then  it  is  clear  that  there  will  exist 
a  consistent  test  of  H0  against  Hx  if  and  only  if  the  quantity 

T 


V 


tends  to  infinity  with  n. 

We  have  by  straightforward  computation 

[(ma)  -  m{0Y  R-1  (ma)  -  m(0))f     /(1)        (ft_  _  x ,    (1)        (0)x 
(ma)-mm)*R  1(m(1)-m(0))  '  ' 

Now  recall  the  estimation  problem  dealt  with  in  section  2.6.  There  we  con- 
sidered the  minimization  of  a  quadratic  form  with  a  linear  restraint 

c*  Rc  =  mini 
c*y  =  l       J' 

ci  \  /  Vi 

with  c  =  j   c2   \     and     y  =  \  y2  \, 

yn 

where  the  y's  are  given  by 

yt=  j  eindy{X). 

—  71 

In  section  2.6  the  time  parameter  ranges  was  —  n  to  n  but  it  is  easily  seen 
that  the  same  result  holds  with  the  range  1  to  n.  Setting 

Rll2c  =  x 
we  have 

II  a:  II2  =  min 


x*R'll2y  =  l 

so    that    x    has    to  be  perpendicular  to  the  plane  x*  R~ll2y=  1,  i.e.,  of 
the  form 

x  =  bR~ll2y, 

where  b  is  a  constant.    The  constant  b  must  then  be 
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h_        1 
y*R-1y 

so  that  the  minimum  value  is 

Now  putting 

y{X)  =  <p1(X)  -<poM 

we  see  that  r  is  just  the  inverse  of  \\x\\2.  But  we  have  seen  in  section  2.6 
that  this  minimum  value  converges  to 


dy{X) 


1    dFW 

as  n  ->oo.  Hence  there  exists  a  consistent  test  of  HQ  against  Hx  if  and  only 
if  the  Hellinger  integral  diverges 

This  is  in  agreement  with  what  one  would  guess;  there  is  a  consistent  test  only 
if  dF(X)  is  small  with  respect  to  \d(<p1(X)  —  <p0(X))  |2  at  frequencies  X  where 
d  <pQ  (X)  and  d  cp±  {X)  differ. 

For  example,  if  we  are  testing  a  constant  mean  mj0)  =  m0  against  mp  = 
m1}  then  q)1  (X)  —  (p0  (A)  has  as  its  only  point  of  variation  X  =  0  with  the  saltus 
mi  ~  mo-  The  value  of  the  integral  in  (2)  is 

lim      {Wl  ~  m°)2 


.o  F(e)-F(-e) 


so  that  there  is  a  consistent  test  of  the  hypothesis  if  and  only  if  the  spectral 
distribution  function  is  continuous  at  X  =  0.  We  will  now  see  why  this  is  so. 
Otherwise  X  =  0  would  be  a  discrete  line  in  the  spectrum  of  xt.  The  process 

xt  has  the  spectral  representation  xt  =  \eltkdz(X).  Since  there  is  a  discrete 

— n 

line  in  the  spectrum  at  X  =  0,  z0  =  z  (0  + )  —  z  (0  — )  4=  0,  we  can  write  xt  = 
zQ-\-xt.  The  process  xt  =xt  —  z0  has  a  spectral  distribution  function  con- 
tinuous at  X  =  0  since  the  jump  z0  of  z(X)  at  X  =  0  has  been  removed.  Under 
the  two  hypotheses  we  have 
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Hj  '/yt^=  mj  +  ZQ  +  x't,     j  =  1, 2. 

It  is  clear  that  the  stochastic  variable  z0  for  a  given  realization  is  just  a 
constant  independent  of  t,  i.e.,  it  behaves  just  like  ra;.  The  presence  of  z0 
then  makes  the  discrimination  between  H0,  H1  much  harder  and  it  is  clearly 
impossible  to  find  a  consistent  test.  One  can  similarly  discuss  trigonometric 
regression.  What  has  been  said  above  is  clearly  related  to  the  discussion  in 
section  2.5. 

In  the  rest  of  this  chapter  we  shall  deal  with  stationary  disturbances  that 
have  an  absolutely  continuous  spectrum  and  a  positive  and  piecewise  continuous 
spectral  density.  A  typical  problem  is  that  of  estimating  the  regression  coef- 
ficient y  when 

where  (pt  is  a  given  regression  sequence,  not  necessarily  of  the  form  (1), 
and  xt  is  stationary  with  mean  zero.  We  shall  restrict  ourselves  to  estimates 
of  regression  coefficients  that  are  linear  in  the  observations  yt,t  =  l,...in. 
There  are  various  reasons  one  can  give  for  this  restriction.  Linear  estimates 
are  reasonable  when  the  residual  xt  is  a  normal  process  and  we  shall  discuss 
this  point  at  greater  length  in  the  next  paragraph.  In  many  cases  xt  will 
not  be  normal  and  there  will  be  nonlinear  estimates  of  y  better  than  the 
linear  estimates.  Nonetheless,  one  rarely  has  much  detailed  information 
about  the  probability  structure  of  the  process  xt.  In  such  a  case  linear  esti- 
mates are  particularly  appealing  because  they  are  of  a  simple  form  and 
their  characteristics  can  be  studied  in  great  detail.  This  is  not  true  of  other 
classes  of  estimates.  Moreover,  the  methods  of  this  book  are  mainly  linear 
methods  so  that  a  detailed  study  of  linear  estimates  is  particularly  appro- 
priate. 

If  xt  were  normally  distributed  and  the  covariance  matrix  R  of  x1} . . .,  xn 
were  fixed  and  known,  qp*R~1y  would  be  a  minimal  sufficient  statistic 
for  the  class  of  probability  distributions  of  interest.  Here  op  is  the  regression 
column  vector.  The  linear  estimate 

c^i^R-iy^^R-iy  (3) 

is  an  unbiased  estimate  of  y.  In  this  chapter  mean  square  error  is  the  cri- 
terion by  which  we  judge  estimates.  In  formal  statistical  language  the  mean 
square  error 

E\c-yY 

is  the  risk  function.  One  can  then  show  that  (3)  is  a  minimax  estimate  of 
y  and  is  an  admissible  estimate  of  y.  This  can  be  shown  by  using  the  tech- 
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niques  employed  inLehmann-Hodges  [1].  In  this  chapter,  we  will  be  interested 
in  finding  out  when  there  are  good  linear  estimates  whose  form  does  not 
depend  on  R. 

Suppose  there  is  a  sequence  of  linear  consistent  estimates  cn  of  y 

n 

cn=  2  a{tn)yt, 

1 

where  by  consistency  we  mean  that  E\cn—  y|2->0  as  n^oo.  Note  that 
this  is  not  quite  the  usual  statistical  usage  of  the  term  consistency.  Then 
mean  square  error  is 

E  I  cn  ~  7 12  =  E  |  cn  -  E  cn  |2  +  |  E  cn  -  y  |2 


"J 


2>(tn)eiU 


f{X)dX  +  \y 


5>(tnV<-i 

1 


As  this  has  to  tend  to  zero 


lim    2>rV*  =  1> 

n->-0       i 


i.e.,  we  must  have  an  asymptotically  unbiased  estimate.    As 

n 

J 


1 


2  71 

f(X)dl>27iminf(X)  2  \a\n)\2 
x  1 


we  also  have 


lim  2  KnT  =  °- 


n->oo      i 


But  Schwarz'  inequality  then  tells  us  that 


lim    2  I  <Pt  |2  =  oo 

n->oo      i 


(4) 


is  a  necessary  condition  for  an  asymptotically  unbiased  estimate.  This 
is  immediately  seen  to  be  also  a  sufficient  condition  for  a  consistent 
estimate  as 

n 

2  &yt 
i 


'n  n 


Z    12 


IM 


has  a  mean  square  error  less  than 
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2 n  max  /(A)— jj >0 

2W2 


as  ti^oo.  It  is  therefore  appropriate  to  introduce  the  condition  (4)  in  the 
following  discussion. 

We  have  already  restricted  ourselves  to  estimates  linear  in  the  observa- 
tions. The  additional  assumption  of  unbiasedness  is  now  introduced.  This 
restriction  can  be  motivated  by  noting  that  the  linear  unbiased  estimates 
are  the  only  estimates  of  y  whose  mean  square  error  is  a  bounded  function 
of  y.  This  is  not  a  serious  restriction  for  the  type  of  estimation  problem  we 
deal  with.  It  is  only  when  we  have  rather  precise  a  priori  information  con- 
cerning the  true  value  of  c  that  it  would  be  advantageous  to  allow  the  esti- 
mates under  consideration  to  be  biased. 

7.1.  Estimating  Regression  Coefficients 

We  have  already  noted  that  in  many  cases  the  observed  process  cannot 
be  assumed  to  have  mean  value  zero.  For  example,  in  measuring  the  current 
at  some  place  in  a  noisy  network  there  might  be  a  dominating  direct  current 
or  alternating  current  component  different  from  zero. 

Let  the  observed  process  be  yt  =  mt+  xt,  where  mt  =  Eyt  is  the  mean 
value  sequence  and  xt,  E  xt  =  0,  is  the  stationary  residual.  There  are  in 
general  the  two  following  problems.  In  characterizing  the  statistical  structure 
of  the  noise  xt,  one  is  quite  often  content  with  knowledge  of  the  spectrum. 
The  other  problem  is  to  estimate  the  mean  value  component  mt.  It  is  clear 
that  if  mt  is  completely  arbitrary  one  cannot  get  any  consistent  estimate  of 
it.  On  the  other  hand,  in  many  situations  one  is  helped  by  the  fact  that  the 
mean  value  sequence  can  be  given  in  the  regression  form  (see  section  2.5) 

p 

™t  =  2  7v  <plv) 
„=i 

or  ra  =  Oy=(<p(1),  cp(2\  ...,  (p(p))y, 

where 


m1 
m  =  l  m2  1  is  the  mean  value  vector, 


m, 
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<pU)  =  I    :    )     j '  =  1 5  •  •  • ,  V  are  the  known  regression  vectors,  and 

w 

/rI\ 

y  =  I    :    ]  is  the  vector  whose  components,  the  regression   coefficients, 

\W 

are  in  general  unknown.  As  one  example  we  mention  the  important 
case  where 

99(iv)  =  cos  (Av  t  +  0V)  j 

yx  is  the  direct  current  amplitude  and  yv,  v  =  2,3, . . .,  p,  are  the  alternating 
current  amplitudes  corresponding  to  the  frequencies  A„.  This  is  a  trigono- 
metric regression.  Another  case  is  the  so-called  polynomial  regression,  where 
(p[v)  =  f.  We  should  also  mention  the  important  case  of  pulse  regression; 
here  (p(tv)  is  periodic  in  t  but  is  not  a  simple  cosine  or  sine  term. 

One  way  to  look  upon  these  regression  problems  is  to  consider  them  as 
questions  related  to  the  construction  of  a  device  optimal  in  the  linear  sense 
for  the  detection  of  signals  in  a  noisy  channel  (i.e.,  testing  a  linear  hypothesis) 
or  for  measuring  the  strength  of  the  signal  (linear  estimation).  In  the  last 
few  years  problems  of  this  type  have  attracted  a  good  deal  of  attention  on 
the  part  of  physicists  and  communication  engineers  (see  e.g.,  Lawson- 
Uhlenbeck  [1]  and  papers  in  issues  of  the  Journal  of  Applied  Physics  from 
1952  to  the  present). 

When  estimating  the  noise  spectrum  we  have  to  eliminate  the  signal  and 
then  treat  the  remainder  by  methods  like  those  described  in  the  previous 
chapters.  The  linear  estimates  of  the  regression  coefficients  used  for  this 
purpose  in  section  6.8  will  be  shown  to  be  asymptotically  optimal  among 
linear  estimates  in  the  present  chapter  when  the  regression  is  a  polynomial 
or  trigonometric  regression. 

As  the  class  of  admissible  estimates  we  will  choose  those  which  are  linear  in 
the  observed  variables  and  unbiased 

n 

cv=  2  aiv)zt,     ^=1,2,  ...,p,  (1) 

Ec  =  y. 

.    By  a  consistent  estimate  (in  the  linear  sense)  we  mean 
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an  estimate  of  the  form  (1)  each  of  whose  components  cv  converges  in 
the  mean  square  to  the  true  value  yv  as  n-^oo. 

Consider  the  covariance  matrix  T  (c)  of  the  estimate  of  y 


T  =  {E(cv-yv)(c„-yfl);     v,  [1  =  1,2,  ...,  p). 

In  section  2.6  we  proved  that  if  the  covariance  matrix  R  of  the  process  xt 
is  nonsingular  (this  assumption  will  be  adhered  to  throughout  this  chapter) 
there  is  at  least  one  estimate  c0  such  that 

r(c0)<r(c) 

for  every  unbiased  linear  estimate  c.  Such  an  estimate  c0  is  called  an  efficient 
estimate  of  y  (in  the  linear  sense). 

One  could  then  define  efficiency  as  the  ratio  det  Y  (cj)/detr  (c*)  and  asymp- 
totic efficiency  as  the  limit  of  this  ratio  as  n^oo  if  it  exists.  We  refer  to 
Grenander  [4]  who  uses  a  method  similar  to  that  of  this  chapter. 


7.2.  The  Regression  Spectrum 
Consider  for  a  fixed  r  the  sequence 


*?>=  2  \¥i 


(r)  12 


t  =  l 


We  have  to  assume  that  0(nr)  approaches  infinity  with  n,  since  otherwise 
no  consistent  estimate  of  yr  exists,  as  has  been  shown.  On  the  other  hand 
On}  must  not  increase  too  fast.  We  assume  that 


<D(r) 

lim-^T=1  (!) 


for  each  h.  If  the  limiting  relation  (1)  holds  for  h  —  1,  it  holds  for  all 
integral  h.  If  (1)  holds,  0(nr)  is  said  to  be  slowly  increasing.  This  terminology 
differs  somewhat  from  the  customary  one.  We  will  also  assume  the  existence 
of  the  limits 


n 


2  ¥tU  & 

J4r's)  =  lim   t=] (2) 

for  1  <  r,  s  <  p  and  all  nonnegative  integers  h.  In  order  to  deal  with  negative 
values  of  h  we  define  qfp  =  0  for  t  <  0,  and  then  R%,s)  is  defined  by  (2)  for 
all  integral  values  of  h.  For  h>  0  we  have 
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n  n  —  h 

Z<Pt-n<pt       z<Pu+n(pu     /^jf) — 7fJir~ 

t  =  l _1 1  /  V^n-fe  ^n-ft 

/o^o?  "  V<Klh<Klh  V    <#?  W 

and  on  taking  the  limits  of  both  sides  and  using  the  slowly  increasing 
character  of  0(nr)  (see  (1))  we  have 

B«f  =  m'r\     i.e.,     R-h  =  Rt 

Let  a  be  an  arbitrary  ^-vector  and  consider  the  quadratic  form 

p  -  n 

<%r        rr        (r\     -<s  as 


a^^  =  a*  Bv.p  a  =  lim     2    ~F==  2  <Pt+v  $ 


c;(s 


s) 


^=,.r-iy*pitirwir",,i^ 

This  relation  is  proved  just  as  before  using  the  slowly  increasing  char- 
acter of  On}.    Let  k  be  an  ra-vector,  where  m  is  arbitrary.    Then 


m  n 

2    kvav^flkfi=lim    2 

„,//  =  !  n-»oo    f  =  i 


2    0Cr   /ev      ,r, 

v,r   |/(D(nr) 


>o, 


so    that    a*  Rv  a    is    a    nonnegative    definite    sequence.    It    then  follows 
(see  section  1.8)  that 

n 

Rh=   j  eih"dM{X),  (3) 


where  AM  (A)  =  M(A2)  —  M{X^)  is  a  nonnegative  definite  matrix  for  every 
interval  (AltA2). 

Assume  in  the  following  that  R0  =  M{n)  —  M(—n)  =  M  is  nonsingular. 
The  converse  would  imply  that  the  w-vectors  (pa\<pi2),...,  (p(p)  would  in  a 
sense  be  asymptotically  linearly  dependent.  By  this  we  do  not  mean  a 
strict  linear  correlation  for  finite  n.  It  is  then  intuitively  plausible  that  one 
could  not  estimate  all  the  regression  coefficients  consistently,  but  only  some 
linear  functions  of  them. 

M  (X)  is  called  the  spectral  distribution  function  of  the  regression  vectors 

<pm,^\...,^\ 

We  are  interested  in  comparing  the  least  square  estimates  cL  and  the 
minimum  variance  unbiased  estimates  c0  of  y  as  n-^oo.  Their  covariance 
matrices  are 

E{cL-y)(cL  -y)*  =  ($*0)-10*-RO(0*  O)"1 

and  E(c0  -  y)  (c0  -  y)*  =  {&*  R-1®)-1 
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respectively  where  R  is  the  covariance  matrix  of  xv ...,  xn  (see  section  2.5). 
We  know  that 

E(cL-y)(cL-y)*  =  ($*<!>)-1Q>*R<S>(Q)*<S>)-1 

>  E(c0  -y){c0  -y)*  =  (O*^-1®)-1. 
We  introduce  the  diagonal  matrix 

0(n1}  1/2  0  . . . 

■Lsn 


o       a>(np)1/2; 

and  say  that  the  estimate  cL  is  asymptotically  efficient  if 

lim  Dn  E  (cL  -  y)  (cL  -  y)  *  Dn  =  lim  Dn  E  (c0  -  y)  (c0  -  y)  *  Dn.  (4) 


n— >oo  n->oo 


We  have  pre  and  post  multiplied  by  Dn  in  this  definition  of  asymptotic 
efficiency  to  insure  that  the  matrices  remain  finite  (do  not  vanish  or  diverge) 
as  n^oo.  We  would  use  the  same  definition  of  asymptotic  efficiency  for 
a  general  linear  unbiased  estimate  c.  In  our  discussion  of  asymptotic 
efficiency  we  shall  for  mathematical  convenience  and  simplicity  allow 
the  stochastic  process  yt  and  the  regression  components  qJJP  to  be 
complex- valued.  The  case  of  greatest  interest  is  of  course  that  in  which 
yt  and  (p{p  are  real- valued.  We  shall  specialize  to  the  real- valued  case  after 
discussions  on  the  complex-valued  case  and  discuss  the  corresponding  re- 
sults for  the  real- valued  case.  Let  A  be  a  point  such  that  for  any  interval 
I  =  ftnAi),  K  <  A  <  A2,  containing  A,  M{X2)  -  M(XX)  =  A M (I)  >  0.  Note  that 
AM  (I)  >0  means  that  AM  (I)  is  a  nonnegative  definite  matrix  and  is 
not  the  null  matrix.  The  set  of  all  such  points  a  is  called  the  regression 
spectrum  and  will  be  denoted  by  S. 

7.3.  Asymtotic  Expressions  for  the  Covariance  Matrices 

When  estimating  the  regression  coefficients  yv  one  should  in  theory  use 
the  best  linear  estimate  which  is  given  by  (see  section  2.5) 

This  has  minimum  variance  among  all  unbiased  linear  estimates.  Now  this 
is  usually  not  a  very  practical  procedure.  First  of  all,  R  is  in  general  not 
known  although  it  can  be  estimated  if  we  have  a  sufficiently  large  sample. 
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Secondly,  even  when  we  know  R  the  expression  for  c0  is  often  quite  compli- 
cated and  requires  tedious  numerical  computations  involving  the  inversion 
of  two  matrices  of  high  order. 

Fortunately  in  the  case  of  some  important  types  of  regression  we  can  find 
a  simple  estimate  that  is  asymptotically  efficient  for  all  positive  and  piece  - 
wise  continuous  spectral  densities  f(X).  This  is  the  least  squares  estimate 
which  is  given  by 

cL  =  {O*O)~10>*x. 

The  matrix  O  *  O  (of  order  px  p)  is  usually  much  easier  to  invert  than  R 
(of  order  nx  n).  We  know  (see  section  2.5)  that  cL  is  unbiased.  The  covari- 
ance  matrix  of  the  least  squares  estimate  is  given  in  the  previous  section. 
We  shall  first  consider  the  asymptotic  behavior  of  the  matrix 

2)B^(cL-y)(cL-y)*2)B  =  2)n(0*0)-10*-BO(9*0')llZ>„f 

for  large  samples.  First  let  us  assume  that  the  disturbance  is  of  the  moving 
average  type 


a 


%t  =  2  avr}t-v, 


i 


where  {r]t}  is  white  noise  with  variance  one.   Then  the  spectral  density 
is  a  trigonometric  polynomial 

-a 

2jifv,   |i>|<a 
Then 


I      0       |v|>a. 


n     w(v)  r        ww 
{Dn\^R<t>D-'}v,=    2    ^  ru~v<Pv 


u,v  =  l      /(D^On 


(i") 


a  n - m      (n)    -O)  -a  n  +  m      (//)         ~(v) 

=  2n    2    fm    2     ^-"--  +  2^      2     fm     2     ^"m   ^ 

^o '    , =i  j/o^  or         m=-i      « - 1  j/og0  or 


m  =  ( 


for  n><x.  As  w^oo  each  of  the  2a +1  terms  tends  to  a  finite  limit 
as  is  easily  seen  by  using  the  slowly  increasing  character  of  the  O^'s. 
The  limit  is 

2  71      2     fmR^ 
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so  that  on  using  the  representation  (7.2.3)  we  have 


lim  D-1$>*R(&D-^  =  27i  j  f(-X)dM(k). 


n— >oo 


Now 
so  that 


lim  Dn1  <S>*  <l>  Dn1  =  R0  =  M 


n->oo 


n 


lim  DnE(cL-y)(cL-y)*Dn  =  M-12ji  j  /  (-  X)  d  M  (A)  M'1 


n->oo 


—n 


as  M  is  nonsingular. 

From  now  on  we  shall  assume  that  the  spectral  density  f{X)  is  piecewise 
continuous  and  positive.  To  avoid  trivial  complications  we  shall  not  allow 
the  possible  discontinuities  of  /  (  —  X)  to  coincide  with  those  of  M  (X).  We  can 
then  approximate  f(X)  from  above  and  below 

h(X)<f{X)<iM, 

where  f^X)  and  f2W  are  finite  trigonometric  polynomials.  It  is  clear  that 
for  the  corresponding  covariance  matrices  we  have 

i?!  <  E<  E2. 
This  is  seen  immediately  by  considering  the  corresponding  quadratic  forms 


Z*RjZ=  j 


—  n 


i 


i  vX 


Z*RZ  =  j 


i 


i  vl 


fj(X)dk,     7  =  1,2, 


f(X)dX 


Then  clearly  for  any  vector  Z 


Z*M-X  j  f1{-X)dM{X)M~1Z<  lim,  lim  Z*DnE(cL  -y)  {cL-y)*  DnZ 


n-*oo 


n->oo 


< 


Z*  M'1  j  ^{-X)dM{X)M~xZ. 


But  as  max[/2(A)  —  /x(A)]  can  be  made  as  small  as  required  except  in  small 
neighborhoods  of  the  points  of  discontinuity  of  /  (A),  it  follows  that 
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lim  Dn  E  {cL-y)  {cL-y)*  Dn  =  M'1  2n  /  /(  -  k)  dM  {X)  M~\        (1) 


71-*  00 


We  now  derive  an  analogous  expression  for  the  optimal  estimate  c0. 
We  deal  at  first  with  a  disturbance  of  the  autoregressive  type 

a 

with  the  roots  of  the  characteristic  equation  2  &/c  zk  =  0  inside  the  unit 

fc  =  0 

circle.    This  corresponds  (see  section  1.6)  to  a  spectral  density 


tw 


in 


a 

2  M 

0 


i  vX 


Consider  the  space  spanned  by  the  stochastic  variables  xlfx2,...t  xn.  The 
vectors  ^a+i^a+2»  •  •  •>  Vn  are  orthonormal  and  orthogonal  to  x±,x2, . . .,  xa  (see 
section  1.6).  Using  the  Gram-Schmidt  orthogonalization  procedure  on 
the  latter  vectors  we  see  that  there  are  numbers  dv/Jl  so  that 

^11^1  =  £i 

^21^1  '  ^22*^2     ^2 


t*,ct\x\    '    ^a2*^2  ~r  *  *  *  "1    Cla(tXa  —  £a 

U  *  X1  +  OqX2    1  +  0aXa_|_2        ^a+2        V&+2 


0  •  xt  +  0  •  x%  H h  &0:rn_a  +  •  •  •  +  b^xn  =  e n  =  rjn. 

where    the  e's  are  orthonormal  vectors.    Denoting  the  subdiagonal  mat- 

/Xl\ 

rix    premultiplying    x  =  I    ;    I    by    A    we    see  that  the  equations  above 

\Xnl 

can  be  written 

Ax  =  e.  (2) 

Thus  A  is  a  nonsingular  transformation  carrying^, . . .,  #ninto  the  orthonor- 
mal system  £1}£2>  •  •  •>  e n-  On  taking  the  covariance  matrix  of  both  sides  of  (2) 
we  have 
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so  that  R~1  =  A*A. 

Hence  except  for  4  a2  elements  in  the  matrix  we  have 


U  =  -00 


where  we  have  set  bv  --  0  if  v  <  0  or  v  >  a.    But 


i  ^.^.-ij?/'*-1^- 


—  7t 


«=-0O 

Consider 

n     r»(v)  r(_1)  ww 

u.f-i  VWQW 


a  n-m     (n)    -(v)  -a  n  +  m     (ft)        -(v) 


1  1 

Here   {ru-^}^^-1,  Pm  =  :, — 5    I  6imAy7rr^A,    and   <5n   tends   to   zero   as  w 

increases.    Indeed  <5n  is  the  sum  of  at  most  4  a2  terms  of  the  form 

tfXv)      (M) 


^*M,U 


V&?  Og° 


The    first    factor    is  bounded  uniformly  in  w  as  seen  from  the  relation 
R1  =  A*  A.    In  fact 


|  r(U(  l}  |  <  (a  +  1)  if2,     where  if  =     max    {di};  bk}. 

0<i,j,k<(x 


Moreover 


<Pu  +  l\ 


w 


(V) 


i 


<&2 


(V) 


(v) 


OS0 


Either   u  is  bounded  and  then  the  second  term  tends  to  zero  or  it  is 
not  and  then  the  first  term  tends  to  zero  since  Ou}  is  slowly  increasing. 
On  letting  n-^oo  it  follows  from  (3)  that 

lim  D;1  <!>*  R1  $>  D;1  =  ^    f  -J—dMW.  (4) 
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Given  a  general  /  (A)  we  approximate  it  from  above  and  below  by  spectral 
densities  of  autoregressive  schemes  and  an  argument  of  the  same  type  as 
before  shows  that  (4)  is  valid. 

_J„/   -A  '     mm /(A) 

x 

and  M  is  nonsingular  it  follows  that  the  matrix  on  the  left  side  is  also 
nonsingular  so  that 

\im  DnE(c0-y)(c0-y)*Dn  =  2n[  I  —^—dM(x\     •  (5) 

It  follows  from  the  construction  that 

M'1  {  f{-X)dM(X)M-1>^  _!_djf(A)J     . 

Let  us  now  look  at  the  case  of  greatest  interest  where  the  process  yt  and 
the  regression  components  cp\j)  are  real-valued.  Since  xt  is  real-valued  the 
spectral  density  /(A)  is  symmetric  about  zero,  that  is  /(A)  =  /(  —  A).  The  fact 
that  the  regression  components  <p(P  are  real  implies  that 


dM(X)=dM{-X). 
It  is  then  convenient  to  introduce  the  nondecreasing  matrix- valued  function 

T(X)  =  M(X+)-M(X-)i     0<A<7r, 
with  real  elements.    Results  (1)  and  (5)   can  then  be  rewritten  as 

n 

lim  BnE(cL-y)(cL-y)*Dn  =  T-127ijfa)dTWT-'  (6) 

n->oo  n 


71    1  \  -1 


and  lim  DnE (c0-y)  (c0-y)* Dn  =  2n  (/ -^rdT (A)  (7) 


n-*oo  ^  q 


fW 


respectively,  where  T  =  T(n)  =  ilf . 

The  regression  spectrum  8  in  the  real- valued  case  is  defined  just  as  in  the 
complex- valued  case.  Let  A  be  a  point  such  that  for  any  interval  /  con- 
taining A,  A  T(I)  >  0.  The  set  of  all  such  points  A  is  the  regression  spectrum. 
Now  S  is  a  set  of  points  in  the  interval  0  <  A  <  n  instead  of  a  set  of  points 
in  the  interval   —  n  <  A  <  n. 
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7.4.  Elements  of  the  Spectrum:  Asymptotic  Conditions  for  the  Efficiency  of 

a  Least  Squares  Estimate 

In  the  study  of  the  least  squares  estimates  an  important  role  is  played  by 
the  elements  of  the  spectrum  S.  The  elements  of  the  spectrum  are  defined  by 
the  following  construction.  The  matrix-valued  function 

N(k)=M-l*M{X)M-1* 

is  a  nondecreasing  function  since  for  Xx  >  X2 

N(kx)  -  JV(Aa)  =  M~U[M (AJ  -  M (A2)]if-,/2  >  0. 
N(X)  is  normed  so  that 

N(n)-N(-7i)  =  I. 

We  say  that  a  set  H  has  zero  N  measure  if  J  dN  (X)  =  0  and  we  identify  sets 

H 

differing  only  on  a  set  of  zero  N  measure.  Note  that  N  measure  zero  and  M 
measure  zero  (analogously  defined)  are  equivalent.  Let  E1dS,E2  —  S  —  E1. 

If  there  is  no  set  E^S  such  that  the  matrices  NiEJ  =jdN(A),  N{E2)  = 

E1 

\dN{X)  >0  and  are  orthogonal 

E2 

N(E1)N(E2)=0f 

we  say  that  S  has  one  element,  the  set  S  itself.  If  there  are  sets  El9  E2 
satisfying  the  above  conditions,  we  can  ask  whether  they  can  be  decomposed 
again  in  the  same  manner.  A  sequence  of  such  decompositions  will  end  in 
a  finite  number  of  steps.  The  order  p  of  /  is  finite,  so  that  there  can  be  at 
most  p  —  1  such  decompositions  and  at  most  p  elements  Et  in  the  final 
decomposition  since 

N(Ei)>0 

N{Et)N{EJ)=0,     i=¥j. 
Let  there  be  q  elements  2£lv..,  Eg,  q<  p,  in  the  final  decomposition.  Then 

N(Ej)=  jdNW>0,  ?  =  1,2,  ...,<?, 

iN(E;)=I, 

N(Ei)N(Ei)  =  0  for  i*j, 
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where  none  of  the  Ej  can  be  decomposed  further.  Multiplying  the  second  of 
these  relations  by  N(Et)  we  obtain  ^(E^  =  N(Et),  i.e.,  N(Et)  is  a  projec- 
tion operator.  Of  course  this  could  have  been  deduced  directly  from  the 
orthogonality  between  N(Ei)  and  TiN(Ej).  By  the  elements  of  S  we  mean 

the  sets  Et  of  such  a  maximal  decomposition  of  S  into  nonoverlapping  sets 
with  orthogonal  nonnegative  definite  increments  N(Et)  of  iV(A). 

We  now  have  to  show  that  the  construction  just  given  defines  the  ele- 
ments of  S  uniquely.  Suppose  the  opposite  is  true;  there  are  then  two  distinct 
collections  {E}} ,  {E'j}  which  form  maximal  decompositions  of  S  into  non- 
overlapping  sets  with  orthogonal  N  increments.  Then  there  must  be  sets  Ek, 
E[,  E'j  such  that 

N{EknE't)>0) 

N(Ek  nK)>oi' 

We  now  need  the  following  result.  If  Mlt  M2,  Nlt  N2  are  Hermitian  non- 
negative  definite  px  p  matrices, 

M  =  MX+  M2\ 

and  M N  =  0,  then  MtNj  =  0,  i,j  =  1,2.  Assume  that  a  unitary  transform- 
ation has  been  used  to  carry  M  into  diagonal  form  with  the  first  m  diagonal 
elements  d1,d2, ...,  dm4z  0.  The  first  m  diagonal  elements  {N}u,  I  =  1,2, . . .,  m 
of  N  are  then  zero.  The  same  is  clearly  true  of  N1  and  N2.  On  using  the  non- 
negative  definiteness  of  the  matrices,  we  see  that  the  only  non vanishing 
elements  of  Nl9  N2  are  those  with  subscripts  (r,s)  where  m  +  1  <  r,s<  p. 
But  then  MiNj  =  0. 

Let  us  write  Ek  =  (E[  f]  Ek) \J  F  =  D\J  F.  As  N{Ek)  =  N{D)  +N(F)  is 
orthogonal  to  every  N(Et)  with  Z4=  k  it  follows  that  the  same  holds  for 
N(D)  and  N(F).  But  as  N(E[)N(\J  E'j)  =  0  it  also  follows  that  N{E[C\  Ek) 

N(F)  =  0  so  that  {Et}  can  be  decomposed  further  contrary  to  our  assump- 
tion. This  proves  the  uniqueness  of  the  elements  as  constructed  above. 

We  have  shown  in  section  7.3  that  a  necessary  and  sufficient  condition 
for  the  asymptotic  efficiency  of  the  least  squares  estimate  is  that 

n  n  i 

/  f(-l)dN(i)  j  j^—^dN^L  (1) 

Using  the  decomposition  of  S  into  elements  we  see  that  (1)  is  equi- 
valent to 
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2    jf(-X)dNW  jj-^-dN(fi)  =  I. 

If  /  ( —  X)  =  Cj  >  0  on  Ej,  the  above  sum  reduces  to 

ZN*(E}-)=  2N(Ej)  =  I, 

so  that  we  have  asymptotic  efficiency. 

The  converse  is  also  true.    Suppose  that  (1)  holds.    Using  a  unitary- 
transformation  U  we  simultaneously  diagonalize  the  matrices 

\f{-X)dN{X)     and       f- -d  #(/*). 

Then 

71  71  1 

/  f{-X)d{UN{X)  U*}H  j  ——d{UN(X)  U*}ti  =  1,  i  =  l,  2,  ....  p. 

—  71  —71    '    *  l' 

We  know  that  the  Schwarz  inequality  reduces  to  an  equality  if  and  only** 
if  the  integrands  are  proportional  on  the  set  on  which  the  nondecreasing 
function  Ni(h)  =  {UN(X)U*}  H,  with  respect  to  which  we  integrate,  in- 
creases. Hence  /(  —  X)  has  to  be  a  constant  ct  on  each  of  these  p  sets  Ah 
i  =  1,2,...,  p.  Consider  any  chain  of  sets  A^A^,...,  Ai]c  such  that 

^1(^1n^i,)>o 


Nik(Atk_inA,k)>0. 

It  is  clear  that  f{~X)  must  equal  the  same  constant  yx  =  c1  on  every  set  in 
the  chain.  Let  Bx  be  the  union  of  all  sets  At  which  are  links  in  any  such 
chain  containing  the  set  Av  Then  /  ( —  X)  =  yx  =  cx  on  Bv  For  every  set  A\ 
construct  the  union  of  all  sets  Aj  which  are  links  in  such  a  chain  containing 
A i.  We  will  then  get  nonoverlapping  sets  Bt,  i  =  1,2,...,  r<  p  and  /(  —  X)  = 
y{  is  a  constant  on  each  of  these  sets. 
Then  (1)  reduces  to 

2N2(Bi)  =  I.  (2) 

1=1 
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But  as  N(Bt)>0  and 


r 


%N(Bi)  =  I 


i  =  l 


we  have  i^(5j)<  /  and  iV2(£j)<  N(Bt).  This  together  with  (2)  implies 
that  N2(Bt)  =N(Bi),  i.e.,  the  N(Bt)  are  projection  operators.  But  as  their 
sum  is  the  projection  /  they  must  be  orthogonal  (see  Halmos  [2],  p.  53). 
Each  Bt  is  then  the  union  of  a  number  of  elements  of  the  spectrum  S. 
Hence  /( —  A)  is  constant  on  each  element  as  was  to  be  proved.  We  restate 
what  has  been  proved  in  terms  of  M  (X). 

The  spectrum  S  of  the  regression  can  be  decomposed  into  disjoint  sets  Ej} 
j  =  1 ,  2,...,  q<  p  such  that 

M(Ei)M^M(Ej)=dijM(Ei),M(Ei)>0, 

and  this  is  the  finest  such  decomposition.  The  sets  Ej  are  called  the  elements  of 
the  regression  spectrum. 

The  least  squares  estimate  cL  of  the  regression  coefficients  is  asymptotically 
efficient  if  and  only  if  the  spectral  density  is  constant  on  each  of  the  elements 
of  S. 

It  is  of  especial  interest  to  find  out  when  the  least  squares  estimate  cL 
of  y  will  be  asymptotically  efficient  for  every  positive  piecewise  continuous 
spectral  density  f(X),  that  is,  asymptotically  efficient  for  most  stationary 
processes  met  in  practice.  From  the  above  it  is  clear  that  this  will  be  valid 
if  and  only  if  the  spectrum  S  of  M(X)  has  q  elements  which  are  distinct 
points  Al5A2, ...,  XQ,  q<  p.  The  least  squares  estimate  cL  of  y  is  asymptotically 
efficient  for  any  process  xt  with  a  positive  piecewise  continuous  spectral  density 
f(X)  if  and  only  if  the  spectrum  S  of  M{X)  has  elements  which  are  q  distinct 
points  A1? . . .,  Xp,  q  <  p. 

For  a  treatment  of  the  analogous  problem  for  vector-valued  processes 
see  Rosenblatt  [1]. 

The  results  are  completely  analogous  in  the  real- valued  case.  We  state 
them  for  completeness.  The  spectrum  S  of  the  regression  can  be  decomposed 
into  disjoint  sets  Ej}  j  =  1, . . .,  q  <  p  such  that 

T(Et)  T-iT(Ej)  =  difT{Et),  T(Et)  >  0, 

and  this  is  the  finest  such  decomposition.  The  sets  Ef  are  called  the  elements 
of  the  regression  spectrum.  The  least  squares  estimate  cL  of  the  regression  coef- 
ficients is  asymptotically  efficient  if  and  only  if  the  spectral  density  is  constant 
on  each  of  the  elements  of  S.  The  least  squares  estimate  cLof  y  is  asymptotically 
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efficient  for  any  real-valued  process  with  a  positive  piecewise  continuous  spectral 
density  if  and  only  if  the  spectrum  S  of  T(X)  has  elements  which  are  q  distinct 
points  Al5  A2,  •  • .,  Ag,  q  <  p. 

7.5.  Polynomial  and  Trigonometric  Regression 

We  shall  first  consider  the  following  rather  simple  but  important  case. 
Let  the  regression  variables 

<pP  =  reitx,     v  =  0,l,2,...,Pl 


where  Al5A2, . . .,  Xm  are  distinct  frequencies  in  (  —  71,71).  In  this  enumeration 
scheme  a  superscript  k  corresponds  to  a  pair  of  integers  (v,  s),  where  v  is 
the  power  to  which  t  is  raised  and  s  is  the  index  of  the  frequency.  Of  course 
we  could  leave  out  certain  pairs  (v,  s)  but  it  is  easy  to  see  how  the  following 
discussion  should  be  modified. 


Here 


<5>V=  I  t 


2v 


n 


2v  +  l 


t  =  l 


2^+1 


is  a  slowly  increasing  sequence.  We  have 

1 


i4A 


(k,k') 


lim 


n^oo  j/0(nfe)  O^0   *=1 


—  y  f+v'  e   ( s    s,)  e     s' 


=  K(2r  +  l)(2/+l)  A 

"  Oss'  ^ 


as  is  easily  verified.    In  the  spectral  representation 

TC 

Rh=  j  eih'dM(X), 

—  71 

the    matrix    function    M  (X)    is    constant    except    at    the  points  Xs.    To 
each  As  there  corresponds  a  jump 

p1  +  p2+~-+Ps-i  +  s-l 


AM(X,)  = 


0 

0 

°1 

0 

Ms 

0 

0 

0 

0. 

245 


7.5 

where  Ms  is  the  nonsingular  matrix 


I        (v  +  v  + 1)  J 

Thus  Jf  =  M  (n)  —  M  (  —  ti)  is  the  direct  sum  of  the  nonsingular  matrices 

As  each  matrix  ilf  sis  nonsingular  it  follows  thatikf  =i?0is  also  nonsingular. 
One  can  see  that  Ms  is  nonsingular  in  the  following  manner.  Now 

M .  =  D [ — 4-— ;  v,v'  =  0,  1,   •■,ps\D  =  I)AsD, 

where  D  is  the  diagonal  matrix  with  the  nonzero  elements  \ 2v  +  1,  v  =0, 
1 , . . .,  ps>  m  the  main  diagonal.  If  As  were  singular  there  would  exist  a  non- 
trivial  (ps  +  1)- vector  z  such  that 


**S  1 

0  =  z*Asz  =    2    ^  z„-     ,     ,  ,  ,  =  f 


2 ^  »* 


d#, 


Ps 

implying  that  ]T  2i/#"  =  0. 

o 

But  then  z  =  0  contrary  to  our  assumption .  The  regression  spectrum  S  of 
M(X)  consists  of  the  elements  A1?...,  Xm  with  corresponding  orthogonal 
saltuses  AM (Xj), . . .,  Ailf  (Am).  Since  the  elements  of  the  spectrum  8  are  the 
distinct  points  Xs,  it  is  clear  that  the  least  squares  estimate  cL  of  the  vector 
of  regression  coefficients  y  is  asymptotically  efficient  in  this  context^  whatever 
the  spectral  density  is. 

Consider  for  example,  the  simplest  case  of  trigonometric  regression 

yt  =  y1eini  +  y2ein*+'~+  ypeiUp  +  xt. 

If  the  frequencies  %VX2,...,  Xp  are  different  the  above  result  holds,  so  that 
the  least  squares  estimates  of  y1,y2>---j  7v  are  asymptotically  efficient. 
The  covariances  of  these  estimates  will  be  given  by  the  asymptotic  expres- 
sions (see  (7.3.1)) 

cov  {Cj,  ck)~  — /  (kj)  dj k  +  o  I  -  I  • 

Hence  they  are  approximately  uncorrelated  with  variances  depending  only 
upon  the  spectral  density  at  the  points  A1,A2,...,  Xp  for  large  samples. 

246 


7.5 

When  we  have  polynomial  regression 

yt  =  c0  +  c1t+  -"  +cptv  +  xt 

we    see    that    the  spectrum  consists  of  only  one  point,  A  =  0,  with  the 
corresponding  jump  of  M  (X) 


AJf(0)  = 


V(2v+l)(2v'  +  l). 


v  +  v'  +  l 


;  v,/  =  0,  !,-••,  p 


It    follows    that    the    least    squares    estimates  are  again  asymptotically 
efficient  and  have  the  covariances 

2nf(0)m/k 
cov  (cjt  ck)  ~  —  ,+vHl      ' 


where 


n 


mjk  =  \  ;  v,  v'  =  0,  1,  ...,p 


The  real- valued  case  is  again  analogous  though  somewhat  more  tedious. 
We  shall  state  some  of  the  analogous  results.  Suppose  the  regression  se- 
quences are  of  the  form 

1(f(t')  =  tv  COS  tXx  J>  =  0,  1,  ...,  j^ 

rf>P  =  f  sin  tXx       v^O,  1,  .>.,2Pi 


i^(lPl+---+lPm_1  +  m-l+,)  =  f  cog  ^^  r  =  ()j  j     _s  i2>m 

2^p----^m-i^-i^)  =  r  sin  ^m        v  =  0,  1,  ...,  2pm, 

where  Al5 ...,  Am  are  distinct  frequencies  in  (0,  tt).  The  sine  terms  for  the 
frequency  As  do  not  appear  if  Xs  =  0.  Since  we  have  already  discussed  the 
case  Xs  =  0,  we  shall  assume  that  none  of  the  frequencies  Xs  are  zero.  One 
finds  that  T{X)  is  constant  except  at  the  points  Xs.  At  the  point  Xs  there  is 
a  jump 

iPi  +  2P1  +  ' '  *  +  iPs-i  +  2Ps-i  +  2 5  -  2 


A  T  (A,)  = 


0 

0 

0 

0 

0 

iTs 

0 

0 

0 

0 

T 

2J  s 

0 

0 

0 

0 

0 
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where 


/l/(2y+l)(2y'  +  l)  , 

i*1. -  — ^7^ ;  v,  v  -a, ....  lPs 


V(2y  +  l)(2y/  +  l) 
v  +  v'  +  l 


2Ts  =  \ ,     ;  :  , :  v,  v'  =  0,  ...,  2ps 


Note  that  for  the  types  of  regression  dealt  with  in  this  section  one  has 
asymptotic  efficiency  of  the  least  squares  estimate  for  all  positive  piecewise 
continuous  /(A). 

7.6.  More  General  Trigonometric  and  Polynomial  Regression 

We  shall  now  generalize  the  type  of  regression  sequence  considered  but  will 
for  convenience  restrict  ourselves  to  the  case  of  only  two  such  sequences. 
The  two  regression  sequences  are 

n 

<p?  =  P{t)  j  eitxdoLW 

—n 

71 

<p(t2)=Q(t)  j  eiUdpW, 


where    P(t),    Q(t)    are    polynomials    P{t)  =  2  Pvtv,    Q(t)=  2  <M"    and 

a  (A),  ft  (X)  are  functions  of  bounded  variation.  We  also  assume  that 
a  (A),  /?(A)  both  have  jumps  and  denote  the  denumerable  number  of  X 
values  at  which  either  a  (A)  or  f$(X)  have  jumps  by  Al5  A2,  ...,  Xs,  ...  . 
The  magnitude  of  the  jumps  of  a  (A),  /?(A)  at  Xs  are  given  by  as,  bs 
respectively.  Let  the  jumpfree  parts  of  a  (A),  /?(A)  be  given  by  a^A), 
ft  (A),  that  is, 

cl1  (A)  =  a  (A)  —   2  as  =  a  (A)  —  a2  (A) 
ft (A)  =  0 (A) -  2  6s  =  /?(A)-ft(A). 

We  shall  show  that  only  the  jumps  of  a  (A),  /?(A)  can  contribute  to 
M  (A).    It  will  be  enough  to  show  that 
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2       JeiUdai(A)     =o(4  (1) 


7.6 


Now 


t--=i 


,i(A-A')  e 


in  (A-A') 


1 


4  a- r) 


1 


dc/Li  (A)  da-i  (A') 


<n     fj     |rfa1(A)||da1W')|  +  -[Var(|a1W)|)]2. 


|A-A'|<€ 


But      J  J     |  <Z  ax  (A)  1 1  c? ax  (A' )  |  — >0    as    e-^O,    since    ax  (A)    has    no     jumps. 

|A-A'|<e 

Thus  (1)  has  been  proven.    The  same  result  holds  for  pt  (A).    Now 


O'" 


I\p(t) 

1 


j  j^d^W 


I       71  71  \  71 

+  2Re(  j  eiiXd^{X)  j  e~itx da2(A))  +    J  eiUda2(A) 

\  —  7T  — rr.  /  — 7T 


But 


1 


/  e'^rfoiW 


=  |P(»)|22 


<  =  1 


/  eitxd*iQ) 


n-l 


2  (|p«  +  i)|2-|P(«)|2)  2 


£  =  1 


fc  =  l 


J  eifc^ai(A) 


Using    the   fact  that  \P(t+  1)  '-  P(0  r  =  0  (r"-1)  and  (1)  we  see  that 


2I*M 
fc=i 


/  ciUdai(A) 


2u  +  l\ 


o(nZB+1). 


Now 


2  1^(0 


/  eiUda2(A) 


pw 


% 


2m +  1 


2u  +  l 


2K 


It  is  then  clear  that 


21-PM 

1 


(31  71  \ 

/  eiUdai(A)  J  e-"Ada,(A)j 
—  71  —31  ' 


<2 


21^(0 


/  ciUdai(A) 


2   n 

I\P(t)f 


/  efUda2(A) 


21* 


-o(w2u+1). 
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But  then 


In  the  same  way  one  can  show  that 
A  similar  argument  implies  that 

n  /  —  \ 


tti  \w + v  +  i  / 

It  is  clear  that 


2 


gttj  V    L    12  VuMv 


1     l   <X  U  +  V+   1  A.<A 


^w^i)-1!    _       s~  ,   „    s~       \d 


puqv       v    _   ,  ?» 


0 


where  Z)  is  the  diagonal  matrix 

pu\(IM)112 

y,         „  ^2  14+1 

^       0        M(iN2rr 

We  do  not  want  to  have  M  =  M  (n)  —  M  ( —  n )  singular.  But  M  can  be 
singular  if  and  only  if  2  |  as  |22  \bs\2  =  |Zas5s|2,  that  is,  if  and  only  if  the 
sequences  as  and  bs  respectively  are  linearly  related  or  equivalently  if  and 
only  if  the  functions  oc2(A),  /?2(A)  are  linearly  related.  We  assume  that  this 
is  not  the  case. 

The  spectrum  8  of  M  (X)  consists  of  the  denumerable  set  of  points  Alf 
A2,  ..-,AS,  ...  .  We  shall  find  the  elements  of  the  spectrum  S.  There  can 
be  at  most  two  elements  EVE2.  If  there  are  two  elements  Elt  E2  then 
Ex  fl  E2  =  0,  E1  (J  E2  =  S  and  the  corresponding  increments 

M^jdMW,   M2=\dM{X) 

Ei  E2 

satisfy  M  t  M'1  M  j  =  biiMi 

(see  section  7.4).  Moreover  the  manifolds  corresponding  to  Mlf  M2  are  each 
of  dimension  one.  In  the  two  dimensional  case,  consider  any  nonsingular 
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positive  matrix  M  and  a  decomposition  M  =  M1  +  M 2  of  M  into  two 
positive  matrices  Mlt  M2  whose  manifolds  are  of  dimension  one  and  have 
only  the  null  vector  in  common.  We  show  that 

MiM^M^d^Mi. 

Let  U  be  the  transformation  that  simultaneously  diagonalizes  Mlf  M2  and 
hence  also  M.  Then  UM1U*,  UM2U*,  U*-1  M~x  JJ-1  are  simultaneously 
diagonal.  But 

M iM-1  M ,-  =  U-1  UMt  U*  U^M'1  U"1  UM}-  U*  t/*"1 

=  dtj  U-1  UMt  U*  U*-1  =  dijMi 

since  the  manifolds  corresponding  to  Ml9  M2  have  only  the  null  vector  in 
common.  But  there  are  two  elements  Elf  E2  of  the  spectrum  8  if  and  only 
if  there  is  a  decomposition  of  the  spectrum  into  two  sets  Elt  E2  such  that 
the  corresponding  increments  M1}  M2  over  these  sets  have  dimension  one. 
This  is  true  if  and  only  if  the  matrices 


p»|2  2  U\- 


h*Ei 


*se  Et 


u  +  v  +  1  As  6  E. 
'  qv  " 


puqv    2/   «A  ^— - -r      2    IM 


*se  Ei 


i=-l,  2 


are  singular.    This  can  happen  if  and  only  if 


{2u+\){2v-\-\)>JE)as\\^E} 


6,  I2 


{u  +  v  +  iy 


2    asbs 


h*Ei 


•  =  1,2. 


(2) 


There  are  two  ways  in  which  this  can  occur.  We  could  have  as  =  0,  seEt, 
and  6S  =  0,  s  6  2^,  *  4=  /.  If  this  is  not  so  then  as  =  0,  s  6  Et  i  =  1, 2.  Then  both 
sides  of  equation  (2)  must  be  positive  for  either  E±  or  i?2  (say  EJ  and 
equality  implies  that  u  =  v  and  that  {as} ,  {bs}  seEx  are  linearly  related 
(Schwarz  inequality).  In  any  case  we  must  also  have  {as} ,  {bs}  s£E2 
linearly  related.  Thus,  if  there  are  two  elements  E±,  E2  of  the  spectrum,  we 
must  either  have  as  =  0,  s£Et,  and  bs  =  0,  s£Ej,  where  i=¥j  or  the  orders 
u  =  v  and  the  sequences  {as} ,  {bs} ,  seEi,  are  linearly  related  i  =  1,2.  Note 
that  the  linear  relationship  between  {as}  and  {bs}  must  not  be  the  same  on  both  Ex 
and  E2  for  the  matrix  M  would  then  be  singular.  It  is  also  clear  that  if  {as} , 
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{bs}  satisfy  either  of  these  two  conditions,  there  are  two  elements  E1  and  E2  of 
the  spectrum. 

We  now  discuss  the  cases  in  which  one  does  have  asymptotic  efficiency  for 
every  f(X).  We  know  that  there  can  be  one  or  two  elements.  In  either  case, 
each  of  the  elements  is  a  point.  If  there  is  one  element,  the  whole  regression 
spectrum  S  is  one  point,  say  [jl,  and  the  only  condition  for  asymptotic 
efficiency  is  that  AM(ju)  =  M  be  nonsingular.  The  nonsingularity  of  M 
implies  that  u  4=  v.  If  there  are  two  elements,  the  spectrum  8  consists  of  two 
points  Al5  X2  and  each  of  these  points  is  an  element  of  the  spectrum.  There 
are  then  two  possibilities  corresponding  to  the  dichotomy  discussed  above. 
The  first  possibility  is  that  ol2{X)  has  the  one  jump  X±  and  fi2{X)  has  the  one 
jump  X2.  The  second  possibility  is  that  a2(A),  ($2{X)  both  have  the  jumps 
Ax,  X2  but  that  u  =  v  and  that  a1b2^p  b±a2.  Thus  we  have  asymptotic  efficiency 
when 

71 

<f>(t1)  =  P(t)je,ndoi(t.) 


<p?>  =  Q(t)jeitxdp(X) 


—  71 


for   all   positive   and  piecewise  continuous  f{X)  if  and  only  if  one  of  the 
three  following  cases  arises 


(1) 


<plL)  =  P(t) 


eiM+   j  eitld^{X) 


^  =  Q  (t) 


—  71 
71 


where  u  =t=  v. 
(2) 


(ft 


(D_ 


eitf*  +   j  eind^(X) 


W 


where  Xx^=  X2. 
(3) 


P(t)  \eu**+  j  eiUdu.L 

L  —71 

yi2,  =  e(JenH   j  eind^(X) 

L  —  71 


<p?>  =  P{t)  LeifHfl2euH   J  e"V«i(>l) 


,(2) 


vr  =  Q  w 


—  71 
71 


\eitK  +  b2eiiK+  j  eind^(X) 


where  u  =  v,  X±=*=  X2  and  ax  b2  ^a2bv 
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We  now  look  at  the  real- valued  case.    Let 


<Pi 


a) 


P(t)  j  eiUda(A) 


(3) 


<p?>  =  Q(t)  j  eindpW, 


where  P{t),  Q{t)  are  polynomials  of  order  u,  v  respectively  with  real  coef- 

n  n 

f icients  and  j  e' n  d  a  (A) ,  j  el  l ; "dfi  (A)  are  real.  The  functions  a  (A),  ft  (A)  are  both 

— n  —n 

assumed  to  have  jumps.  Let  a^A),  /^(A)  again  denote  the  jumpfree  parts 
of  a  (A),  /5(A).  An  argument  similar  to  that  given  above  indicates  that  one 
has  asymptotic  efficiency  when  the  regression  sequences  are  of  form  (3) 
if  and  only  if  one  of  the  following  cases  arise: 


(i)    <pi»=p(t) 


<rT  =  Q  (t) 


ax  cos  t  A  +  a2  sin  t  A  +  a3  cos  £^  +  a4  sin  t  /u  + 

n 

+  J  emdai(A) 

—  n 

bx  cos  t  A  +  b2  sin  t  A  +  63  cos  t/u  +  b^  sin  £  //  + 

—n 


where 


*,  „*0,  *+„,  «-•      and     (£j),     (JJ|) 


are  matrices  of  rank  one. 


(2) 


^  =  P{t) 


ax  cos  £  A  +  a2  sin  £  A  +   J  e2  n  d  ax  (A) 


?r} = c  w 


6X  cos«A+62  sin£A+   j  eiUdpi(X) 


where  A  4=0  and  w4=v  or  o^&j+ag&i. 
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(3) 


<pl1)  =  P(t)    a±-\-a2  cos  tX-Va^  sin  tX+  j  eindoi1(X) 


yr  =  Q  (t) 


b1  +  b2  cos  tX  +  b3  sin  tX  4-   j  eitxdpi{X) 


where  X  =4=0,  w==«;  and 


(4) 


I,2  _3)  is  of  rank  one. 

6+  Je"AdA(A)l, 

— n  -J 


f!"=PW 


9>r'  =  «  W 


where  w  +  v. 


7.7.  Some  Other  Types  of  Regression 


Suppose  for  simplicity  that  we  deal  with  just  one  regression  variable, 
p  =  l,  say  (pt.  It  is  of  interest  to  study  what  happens  if  the  regression  vari- 
able is  periodic  so  that  (pt+q=(Pt>  where  q  is  the  period  and  is  a  positive 
integer.  We  shall  then  say  that  the  regression  variable  forms  a  periodic 
pulse  train  with  the  pulse  shape  {<Pi,<p2,  •  •  •>  <Pq}  •  It  can  then  be  written  as 

cpt  =  b1  +  b2e      Q  +  -~bge         9    . 

The  matrix  {e2ntv,Q);  t,  v  =  0,  1,  . . .,  q  —  1}  is  nonsingular.  The  spectrum  S 
consists  of  those  points  2nv/q  for  which  the  constant  bv  does  not  vanish. 
As  p  =  1  the  spectrum  is  indecomposable.  Only  if  the  spectral  density  is 
the  same  for  these  points  will  the  least  squares  estimate 


cL 


Zytyt 

_L 

On 


be  asymptotically  efficient.  This  will  in  general  not  happen.  The  variance 
of  the  estimate  will  be 


v 


&  [cl] 


Z27tf(27c!-)\b,\- 

v=o  \       qj 


[zVr]' 


n 


asymptotically. 
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In  another  case  of  interest  the  regression  variable  <p  t  has  been  obtained 
as  an  observation  of  a  strictly  stationary  process  with  a  spectrum  (p(X). 
Assuming  this  process  to  be  ergodic,  it  is  known  (see  e.g.,  Doob  [1])  that  cpx 
has  a  spectrum  in  the  sense  of  section  7.2  with  probability  one  and  that  this 
spectrum  coincides  with  (p{X).  Hence,  as  <p(X)  will  have  more  than  one  point 
of  increase,  the  least  squares  estimate  of  y  will  not  be  asymptotically  effi- 
cient in  general.  The  asymptotic  variance  will  be 

.-T 

2n  j  f{X)d(p{X) 
D2  [ej] 


>2r.*i Z£L 


/  dcpW 


2 

n 


Note  that  after  having  observed  op t  it  is  regarded  as  a  sequence  of  known 
constants  and  we  act  as  if  we  had  completely  forgotten  that  (p  t  forms  a 
stochastic  process.  This  is  a  special  case  of  the  fixed  variate  case,  which  is 
sometimes  dealt  with  in  econometrics. 


7.8.  Detection  of  Signals  in  Noise 

We  will  show  in  this  section  how  some  of  the  results  obtained  in  this 
chapter  have  an  immediate  application  to  an  important  problem  in  the 
theory  of  communication.  It  will  be  most  convenient  to  word  the  statements 
in  terms  of  a  continuous  time  parameter. 

A  message  y{t),  0  <t  <  T,  has  been  received,  but  due  to  disturbances  in 
the  channel  of  communication,  we  cannot  reconstruct  the  original  signal 
s(t)  directly.  Suppose  that  we  know  that 

y(t)=s(t)+x(t), 

and  that  the  noise  x  (t)  is  a  stationary  process  with  an  absolutely  continuous 
spectrum  and  a  continuous  spectral  density. 

Consider  the  case  of 

s(t)=y 

an  unknown  constant  mean  value  y.  What  is  the  best  way  to  find  a  reasonable 
estimate  of  y\  The  result  of  section  7.5  tells  us  immediately  that  as  this 
corresponds  to  a  regression  spectrum  having  only  one  element,  the  point 
"k  =  0,  we  will  get  an  asymptotically  best  estimate  (in  the  sense  used  through- 
out this  chapter)  by  constructing  the  least  squares  estimate, 
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T 

cl=y     y(t)dt. 

0 

Strictly  speaking  this  was  shown  only  for  the  case  of  a  discrete  time  para- 
meter, but  a  proof  of  the  asymptotic  optimality  of  cL  can  be  found  in 
Grenander  [1],  Various  physical  devices  can  be  thought  of  to  find  cL.  We 
might  average  y(t)  in  a  d.  c.  meter. 

Now  one  could  make  the  following  objection.  It  is  quite  true  that  cL  is 
a  best  estimate  in  the  large  sample  sense,  but  in  practice  it  will  be  desirable 
to  keep  T  fairly  small.  This  may  be  because  the  noise  cannot  be  assumed 
to  be  stationary  over  very  long  time  intervals,  or  because  y  has  a  slow  secular 
variation.  Anyhow  it  would  then  be  natural  to  ask  whether  we  could  then 
find  a  better  estimate. 

The  following  case  is  simple  but  illustrative.  Suppose  that  x(t)  is  generated 
by  white  noise  passed  through  a  passive  linear  network,  with  a  finite  num- 
ber of  loops.  We  then  know  (see  section  1.8)  that  the  spectral  density  will 
be  of  the  form 

/W«- s-^ — i-  (i) 


2  av(iX)v 


v  =  0 


Then  the  linear  unbiased  estimate  of  minimum  variance  of  y  can  be  explicitly 
given.  It  is 


C°  =  2 


n,L,/i°»i[(-'r y"' (0)  +  yM (T]\  +  a0  \y(t)dt)         (2) 
ai  ~j~  ao 1   I   o  J  ) 


(see  Grenander  [1],  p.  242).  We  see  that  it  differs  from  cL  mainly  in  that 
we  have  added  the  values  of  y  (t)  and  some  of  its  derivatives  (which  exist  in 
the  mean  up  to  the  order  n  —  1 )  at  the  points  t  =  0  and  t  =  T.  The  variance 
turns  out  to  be 

2tiK 


&  [cj  = 


a0(2a1  +  a0T) 


On  the  other  hand  we  have  an  asymptotic  expression  for  the  variance  of 
the  least  squares  estimate  in  general 

00     gjjj^    . 

-°°      (2) 
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Now  cL  asymptotically  behaves  like  (2),  as  we  know  it  should.  The  expression 
(2)  is  not  very  easy  to  handle  and  as  D2  [cL]  is  not  appreciably  smaller  than 
D2  (c0)  unless  T  is  quite  small  it  does  not  seem  worthwhile  in  general  to  use 
c0  instead  of  cL.  If  we  assume  that  f(X)  has  a  bounded  second  derivative  in 
(-e,e) 

sinz 


WeJ-£  J    +Y  J  -p^-fWd* 


|A|>e  I  A |<c 


sin2 


«r-r/(Q)+f(0)fl«+o(^ 


=  o  sr  +sr   I   -?«s-r/(Q)+f(0)fl«+o 


(l\ 


2nf(0)  +  0 


so  that  D2  [c  J  =  ^^  +  0 

The  conclusion  holds  a  fortiori  if  /(A)  is  of  a  more  complicated  form  than 
(1)  as  it  is  then  not  known  whether  there  exists  any  closed  form  for  the 
best  estimate. 

If  we  instead  choose  s(t)  as  a  periodic  pulse  train  with  an  unknown 
regression  sequence,  the  same  conclusion  cannot  be  drawn.  We  have  seen 
in  section  7.7  that  the  least  squares  estimate  is  then  not  asymptotically 
efficient  so  that  even  for  a  very  long  interval  of  observation  it  is  possible 
to  get  a  substantially  better  estimate.  It  would  then  be  of  interest  to  find 
an  expression  for  such  an  estimate  in  closed  form.  We  would  have  to  know 
the  spectral  density  at  the  frequencies  characteristic  of  the  pulse  but  these 
values  could  be  estimated  by  the  methods  described  in  Chapters  4  and  6. 

The  reader  interested  in  this  topic  should  read  the  papers  by  Davis  [1], 
Rudnick  [1],  Stone  [1]  and  the  book  of  Lawson-Uhlenbeck  [1]. 

7.9.  Confidence  Intervals  and  Tests 

So  far  in  this  chapter  we  have  only  dealt  with  point  estimation.  In  a  prac- 
tical situation  it  will  also  be  desirable  to  find  confidence  intervals  for  the  regres- 
sion coefficients.  To  illustrate  how  this  can  be  done  in  the  present  context 
we  choose  the  following  simple  case. 
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We  suppose  that  the  regression  consists  of  a  constant  and  a  trigonometric 
component  of  frequency  A  4=  0, 

Vt  =7i  +^2  cos  ^  +  #*• 
The  least  squares  estimates  are  given  by  (see  section  7.3) 


where 


2  vt 


cL 


-r\-A 


2  cos  tXyt 
i 


n 


2  cos  tX 


-l 


A  =  ($>*®)-1 


2  cos  tX,     n 


1/1+0(1),      0(1)  \ 

n\o(l),  1  +  0(1)7 

We  have  seen  in  section  7.5  that  cL  is  asymptotically  efficient  with  variances 
of  the  order  l/n.  It  follows  then  immediately  from  the  asymptotic  form  of 
A  that  the  estimate  m  =  (mlt  m2)  with 

1   n 
n  i 

1   n 
m2  =  -  2  cos  £  A  2/t 
n  i 

is  also  asymptotically  efficient  with  the  covariances 

2n 


D*[m1]~-—f(0)  +  o(n-1) 
n 

D2\m2\~  —  f{X)  +  o{n-1) 


n 


-i> 


cov  [ml5  ra2]  =  o  (n    ) 


(i) 


They  are  a  bit  easier  to  compute  than  C£. 

In  any  case  m  is  asymptotically  efficient  and  it  seems  reasonable  to  use 
it  for  the  construction  of  confidence  intervals.  Assuming  that  the  noise  xt 
is  a  linear  process  with  a  positive  and  continuous  spectral  density  /(A)  we 
know  (see  Moran  [2])  that  m  is  asymptotically  normal  with  mean  (ylt  y2) 
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and  covariances  given  by  (1).  Let  us  take  a  consistent  estimate  of  the  spectral 
density  f{X)  (see  Chapter  4)  say  /*(A).  Then,  on  setting 

,/-    m1-y1 
V27cf*(0) 


,-    m2-y2 


2 


we  know  that  for  large  samples  z±,  z2  are  approximately  normally  distributed 
and  independent  with  means  zero  and  standard  deviation  one.  We  then  have 
the  two-dimensional  confidence  interval 


mi  ~  ai  V — —  <  /!  <  ^i  +  aj  ' 


n  '         n 


m2 


a2  (/ <y2<m2  +  a2  [/ 

with  a  confidence  coefficient  P  =  P1P2,  where 


a? 


2! 

2    rl  -y  ,_ 


K2 


dz  =  Pj,     /=1,2. 


7T 


-a,- 


The  above  statement  should  of  course  be  interpreted  as  a  large  sample 
relation. 

The  above  can  be  extended  in  an  obvious  way  to  more  general  situations. 
While  this  method  will  serve  as  a  reasonable  first  approximation  it  would 
be  useful  to  have  a  careful  examination  of  the  range  in  which  this  approxi- 
mation is  valid.  If  one  is  interested  in  obtaining  tests  for  linear  hypotheses  in 
time  series  analysis  it  is  easy  to  see  how  such  can  be  obtained  by  slightly 
modifying  the  above  procedure. 
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CHAPTER  8 


ASSORTED  PROBLEMS 


8.0.  Introduction 

Assorted  problems  that  are  of  interest  are  discussed  in  this  chapter.  Some 
of  the  problems  are  clearly  related  to  topics  discussed  in  previous  chapters. 
We  felt  it  would  be  best  to  include  them  in  a  separate  chapter  as  they  could 
not  be  well  motivated  without  the  extended  development  given  before. 
Other  problems  were  included  because  they  deserve  discussion  even  though 
they  are  not  related  to  topics  discussed  earlier.  At  times  the  discussion  may 
be  heuristic  due  to  our  inability  to  supply  a  rigorous  proof;  whenever  this 
occurs  it  is  clearly  indicated.  Such  heuristic  results  are  included  whenever 
they  are  of  sufficient  interest.  The  problem  of  greatest  interest  is  that  of 
constructing  the  linear  predictor  for  a  stationary  and  purely  nondeterministic 
process  when  the  spectrum  is  not  known  but  has  to  be  estimated  from  the 
sample. 

8.1.  Prediction  When  the  Conjectured  Spectrum  is  Not  the  True  One 

We  have  seen  in  Chapter  2  how  to  construct  the  best  linear  predictor  for 
a  stationary  and  purely  nondeterministic  process,  under  the  assumption 
that  its  spectral  density  is  known.  As  the  true  spectrum  is  not  known  very 
often  a  priori,  it  is  of  considerable  interest  to  see  what  happens  when  a  pre- 
dictor is  computed  on  the  basis  of  a  conjectured  spectral  density  g  {X)  when 
the  true  one  is  f{X).  We  know  that  the  mean  square  error  of  the  best  linear 
predictor  of  xx  given  that  ...,  x_2,  x_l9  x0  have  been  observed  is 


o2-=2jz  exp     — 

\Z  71 


-71 


log  f(X)dX  >0. 
If  the  spectrum  is  not  known  but  is  assumed  to  be 

n 

gW^fW,  f\logg(k)\dl<oo, 

—n 

prediction  based  on  this  assumption  will  of  course  not  be  as  good  as  pre- 
diction making  use  of  knowledge  of  the  true  spectrum.  We  can  in  fact 
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write  down  explicitly  the  predictor  which  appears  to  be  the  best  if  g  (X)  is 
assumed  to  be  the  true  spectral  density.  It  is  (see  section  2.2) 

xx  =  J   e    — w       dzW,  (1) 

—  71 

where  z  (X)  is  the  spectral  process  with  orthogonal  increments  corresponding 
to  xt.  The  function  c(z)  was  defined  in  section  2.2  and  we  remind  the  reader 
of  the  relation 

\c{e-n)Y  =  27ig(X). 

Here  c(z)  is  constructed  from  g{X),  not  /(A).  The  function  c(e~u)  should  be 
interpreted  as  the  boundary  value  of  the  function  c(z)  at  the  unit  circle  in 
the  complex  plane  as  approached  radially  from  within. 

Another  condition  necessary  to  give  integral  (1)  meaning  is 


n 


\ 


Then  the  integrand  in  (1)  is  quadratically  integrable  with  respect  to  f(X)  so 
that  integral  (1)  is  well  defined  as  an  element  of  THoc-  As  the  integrand  is 
essentially  a  trigonometric  series  involving  only  the  nonpositive  powers  of 
eu,  xt  belongs  to  JH0. 

The  mean  square  error  of  prediction  is  then 


^-'•J^L7'> 


-  I'M" '*<■{¥„ h°*>m") 

-71  —  JI 

—  71  —71 

The  factor  multiplying  a2  is  greater  than  or  equal  to  one  by  Jensen's  in- 
equality. Equality  is  obtained  if  and  only  if  f(X)  and  g(X)  are  such  that 

almost  everywhere. 
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8.2.  Uniform  Convergence  of  the  Estimated  Spectral  Density  to  the  True 

Spectral  Density 

Suppose  that  we  use  a  spectrograph  estimate  f%  (A)  as  described  in  Chapter 
4  to  estimate  the  unknown  spectral  density  /(A).  We  assume  that  the  spectral 
density  /(A)  is  positive  and  has  a  bounded  second  derivative.  Let  /5v(A)  be 
a  spectrograph  estimate  with  positive  weight  function 

1     N  c 

wN(x -}.)  =  —    2    w™eivix-x\    wiN)  =  wW;       wN(x-X)dx=l. 

2nv=-N  J 

—  n 

The  mass  of  the  weight  function  must  concentrate  in  the  neighborhood  of 
A  as  N-+oo  to  ensure  consistency  of  the  estimate,  but  it  should  not  do  this 
too  fast.  We  therefore  assume  that 

2  K">|=0(1/^).  (i) 

\v\<N 

The  observed  process  is  supposed  to  be  linear. 

Under  these  assumptions  we  will  show  that  f*N  (A)  converges  in  prob- 
ability to  /(A)  uniformly  in  A.    In  fact 

2n[fW-f*N(m=    2    rv{l-w™)eivX  +    2    r9J'x  +    2    (rv-r*v)eivXw<^ 

\v\<N  \v\>N  \v\<N 

=  S1Jr  $2  +  $3> 

where  r*  =  Cv/N.    But 

Kl<    2    |r,||l-ui*>|. 


\v\<N 

00 


As  2  I  rv  |  <  °°    irv =  0  (v  2)  since  /"  (A)    is    bounded)    and  for  each  fixed 

-  oo 

vwlvN)->l  with  increasing  sample  size,  it  follows  that 


lim  max  1 81  \  =  0. 

N-*oo       A 

It  is  also  clear  that 

\s2\<  2  W-otfr1) 

\v\>N 

so  that  lim  max  I S2 1  =  0. 

JV->oo        A 

The    third    sum    which  contains  the  random  part  can  be  shown  to  be 
uniformly  small  with  great  probability.    We  have 
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Emnx\S3\<    2    WvN)\E\rt-rv\. 

A  \v\<N 

But  for  a  process  of  the  type  considered  one  can  show  that 

Var  (r*)<|, 

where  the  constant  A  is  independent  of  v.    Hence 


E\r*-rv\<E 


* 

rv  - 


N-\v\ 


N 


+ 


v 

N 


i. 


sfl+ 


v 


VN 


so  that 


i/A  \w(N)\ 

JBm«|S,|<J/J    z    | «<»>|+    2    JTTtr,M  =  0{1)- 

X  '     JV  \v\<N  \v\<N  (1  +  I  V\)l\ 

Thus,  given  any  e>0 

P{max|/Ja)-/a)|<£}->l 

as   iV->oo    so  that  f%(X)  converges  in  probability  to  /(A)  uniformly  in 
X  as  N-^-oo. 


8.3.  The  Asymptotic  Distribution  of  an  Integral  of  a  Spectrograph  Estimate 

Consider  the  statistic 


j  f%Wdg(k)=  jjIN(l)wN(k-l)dldgW, 


(1) 


—  n 


where    dg(X)  =dg(  —  X)    and    J  \dg{X)  \  <  oo.    Let  the  observed  process 

—  71 

xt  be  normal.    We  also  assume  that 

wiN)  =  0     when  M  >  [2Va]  (2) 


for  some  a,  J  <  a  <  1 ,  and  that  given  any  e  >  0 

\wN(x  —  A)|^0 

uniformly  as  iV^oo  when  \x  —  X\  >  e.  The  asymptotic  distribution  of  (1) 
as  iV-^oo  will  be  obtained  under  these  assumptions. 
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Consider  jj 2 n  f  (I) IN, ^{l)wN{X  —  l)  dldg (A) 


(3) 


as  an  approximation  to  (1).    The  difference  between  (1)  and  (3)  is 

1 


27lN  r,st 


2_       &r  Q>s  ^rs> 


where 


N 


Mrs  2-,      sn-r  swi-s  ^n-m  Qn-m  2*  €n-r  ^m-s^n-m  ffn 

n,m  =  l 


m 


n  =  l+r,  N+r 
m  =  l  +  s,  N+s 


and  gv  =  J  eivXdg(X).    Now  E \drs\<2(\r\  +  \s\)  2  1^11^1  so  that 


E 


2tzN 


2       ar  as  dri 


r,s=  -  oo 


<^I(\r\  +  \s\)\ar\\as\I\wr\\gp\=o(^ 


Indeed,  since  | /" (A)  |  is  bounded,  the  sequence  \gp\  is  uniformly  bounded 

and  (8.2.1)  holds.  The  difference  between  (1)  and  (3)  is  then  o  (1/V~N)  with 
probability  close  to  one;  this  is  negligible  as  we  shall  see.  Let  us  now  consider 
the  approximation  (3)  which  can  be  written 

K-    -2    (  1    *V_,W"V,  =  r-    I    |    I    r,-Mn»-  (4) 


The  variance  of  (4)  is 


11  ^  iV^-j 
je2  N  ;4     JV 


(  2  ?v-,^V)  , 


where  the  primed  summation  indicates  that  the  coefficient  for  j 
that  for  /#=  0.  Because  of  (2)  it  follows  that 


0  is  half 


N 


2    rv_jW(vN)gv 


v=-N 


<#      2      \rv-j\<K'r. 


if  |;|>[2^,   l>£>a,    But  then 


N1+? 


so  that 
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llJiir'-'^9Hw) 


in  probability  as  iV->oo  which  is  negligible.  It  is  therefore  enough  to 
consider 

1  [^]  c 

71  y=o   ^V 

AT 

where  hf=    2   rv-jwiN)9v 

v=-N 


Note  that 


^A0=  f  [i(l)wNtt-l)dldg(X)-*\  fWdgW^O 


as  JV->oo  so  that 


N 

2 


2    V~  2  *? 


as  2V->oo.    Now 


nfioN  3     2  n   ° 

1         #    £2_  i  1    [JV0]  N-j  t     t 

=  oz  2-^^o  +  -  2  2  — izr-hi 

6  71    V=\  J\  71      ;  =  1    »  =  1  IV 

!i       N-[NP]    fc2         i  N-[NP]    [AT/5]    £     £  \ 

2^   2   ^«+   2     2^)  + 

Ah      2      «zli,+I      2      "£*£-%) 

The  variance  of  the  second  bracketed  term  is  less  than 


N2 


N  /  1       N         \ 


It  will  therefore,  be  enough  to  consider  the  first  bracketed  term  which 

will   be   shown  to  be  asymptotically  normal  with  mean  zero  and  vari- 

l        N 

ance  — — %    2    h%     I*1    dealing    with    this    term  consider  blocks  in  v  of 

2  71     j=-N 

length    [NY],  l>y>{3,    separated  by  blocks  of  length  [N&].    There  will 
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be  roughly  N1"7  blocks  of  each  kind.  The  blocks  of  length  Nv  are 
independent  of  each  other  since  they  are  separated  from  each  other  by 
blocks  of  length  iV3.  The  variance  of  a  block  of  length  Ny  is  asymp- 
totically the  same  as 

Adding  over  all  blocks  of  length  NY  we  have 

1      1     N 

2tt2  N  h   j' 

Since  the  blocks  are  independent,  it  is  clear  that  the  sum  of  the  blocks 
is  asymptotically  normal  with  mean  zero  and  variance 

1      1     N 

— -  —  y  h2 

The  sum  of  the  blocks  of  length  -jV3  which  have  been  left  out  has  a 
variance  asymptotically  the  same  as 


1  N  I  1     N      \ 

71  -N  \JS    _jv       / 


so  that  it  can  be  neglected.    All  the  approximations  made  can  now  be 
seen  to  have  been  justified.    It  is  now  clear  that 

n 

SflWdgW 

—  n 

is  asymptotically  normally  distributed  with  mean 

n. 

jj  f{l)wN(X-l)dldg(X) 
and  variance 

n  n 

ii  \h'  =  W  JV(')(  J* «»(»-*) *?(*))* «"• 

—  n  —n 

A  case  of  particular  interest  is  that  in  which  g(X)  is  a  jump  function  with 
jump  c/Lj  at  Xj  where  0  <XX  <  •  •  •  <  Xk  <  n.  Then 


j/;(A)^(A)=2«,/ia,). 


k 
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Our  result  then  implies  that  /*(A;),  j=l,...,k,  are  asymptotically 
normally  distributed  and  independent  with  means 

n 

j  wN(Xj-l)f(l)dl,     j=l,  ...  ,k 

—  n 

and  variances 

—  f(Xj)  jw%{x)dx,     j=\,  ...,k. 

—  71 

This  result  still  holds  and  can  be  proved  in  essentially  the  same  way  when 
xt  is  a  linear  process  such  that 

2r2ar  <  oo. 

If  xt  is  normal  it  can  be  seen  that  any  linear  form  in  xt,  and  ft/i^i),  •••, 
f%{Xk)  are  asymptotically  normal  and  independent.  This  is  clearly  useful  in 
connection  with  statistical  problems  of  well-known  type.  Thus  z(Xj),  /J/(A;), 
j  =  1,  ...,  k,  are  asymptotically  independent.  It  then  seems  plausible  that 
the  two  processes  z(X)  and  /*(A)  are  asymptotically  independent.  This  has 
not  been  proven  but  seems  to  be  indicated  by  the  reasoning  above. 

We  shall  discuss  the  mean  square  error  of  prediction  when  the  spectrum 
is  estimated  in  the  next  section.  The  computation  will  be  carried  out  in  a 
heuristic  manner,  as  if  there  were  independence  between  z  (X)  and  f%  (X)  for 
finite  sample  size. 

8.4.  The  Mean  Square  Error  of  Prediction  when  the  Spectrum  is  Estimated 

The  mean  square  error  of  prediction  is  computed  heuristically  when  the 
observed  process  is  normal  and  the  spectrum  is  estimated.  The  weight  func- 
tion of  the  estimate  is  positive  so  that  we  have  a  positive  estimate  of  the 
spectrum.  The  error  of  prediction  is 


E\x\-x1\2  =  E 


fe"c(0)J    ... 


The  function  c(z)  is  a  functional  of  /]J(A),  the  estimated  spectral  density. 
Moreover 

E\dz{X)\L=f{X)dX. 


We  have  already  noted  that  it  seems  plausible  that  /J (A),  z(X)  are  asymp- 
totically independent  as  N-^oo.  We  shall  assume  that  the  approximation  is 
good  if  we  carry  out  the  computation  as  if  z(X),f%(X)  are  independent  for 
finite  N  and  thus  consider 
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n 

I 


E 


c(0) 


ik\ 


c{e~lA) 


f(X)dX. 


Let  us  now  mention  that  there  is  one  situation  where  this  assumption  is 
automatically  fufilled,  viz.  when  /*(A)  is  obtained  from  a  realization  of 
the  process  xt  that  is  independent  of  the  realization  used  for  prediction. 
Of  course,  this  case  is  of  considerable  practical  interest. 

Now 


c(0) 
c  {e~a) 

We  use  the  fact  that 


exp 


[i  A*  * 


Wdk)    fN{X). 


max  |/J(A)-/(A)|->0 

as    N->oo    in    probability    and    make    several  approximations.    We  are 
interested  in  the  expected  value  of 


exp 


n  n 


<U 


—  71 
71 


g   exp 


<M. 


(1) 


This  amounts  to  replacing  g  (A)  by  /at  (A)  in  (8.1.2)  and  taking  the  mean 
value  of  the  resultant  expression.    Let 


h(X)  = 


f  W  -  fl  (A) 
fW 


The  second  factor  of  (1)  can  be  written  in  terms  of  h  as 


dk 


2ti  J  l-A(A)eXp  \2ti 


log  (l-h(X))dX\ 


71  71  71 

~  \2tz+    [h{X)dX+    [h*(X)dX  +  o(  f|fc(A)|8<M 


x 


n 


X 


|exp-^-|-    \h(X)dX-\   J  h2{k)dA  +  o(  \\h(k)\*dX 
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7i  tc  jt 

^-|2^+    \h{X)dX  +    j  h2(X)dX  +  o(  J  \h{X)\*dX 


x 


-71 


71  jr  7t 

[i-^1-  [h{i)dx-  [h2{X)dx+^-A  \%(X)dx\  +r} 


xm- 


=  1  + 


n 
tW    J 


(A)rfA- 


8^ 


A(A)rfA    +i?2, 


(2) 


where  |i?2|  is  less  than  a  finite  number  of  terms  of  the  form 


/  h{X)dX 


n 


jh2(X)dX,    j\k(X)\*dX,  etc. 


Since  f%  (X)  converges  uniformly  to  /  (X)  in  probability,  it  follows  that 
|  R2 1  is  much  smaller  than  the  second  term  of  (2)  with  great  probability. 
We  shall  assume  that  E  \  R2  |  is  much  smaller  than  the  expected  value 
of  the  second  term  of  (2)  as  JV->oo. 


Let 

so  that 
Then 


h(x) 


bN(x)  =  Ef*N(x)-f(x) 

_Ef*N(x)-f*N(x)-bN(x) 


J  J  r(x)        J  r(x) 


X 


—  71 
71 


[b\ 


2  (x)  4n2   C 

dx-\ — —       w%  (x)  d  x 
N    J 


(x) 


while 


El    \h(x)dx 


fbN(x)      > 


71 


+  E 


EjU±LtUx),Y 

—m — dx) 


—  71 
71 


—  71 

71 


b"{x)dxJ  +  ^  jf(x)(  [wN(x-X)~dX\dx 


fix) 


fW 


—  71 


—  71 


dx)  + 


/w 


JV 
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Under    the    assumptions    we    have    made    the    error    of    prediction    is 
given  by 


(72 


:i+«j(7&)"''+s  !■*<■"-»(  Jt^*-)' 


—  n 


up  to  terms  of  smaller  order.  If  this  argument  could  be  made  rigorous,  the 
result  would  be  useful  for  deciding  which  spectrograph  estimate  to  use 
when  constructing  optimal  predictors. 

8.5.  Other  Types  of  Estimates  of  the  Spectrum 

The  estimated  spectral  density  ft/W  is  a  trigonometric  polynomial  in  X. 
In  computing  the  corresponding  linear  predictor  we  run  into  practical  dif- 
ficulties. Unless  the  order  of  the  polynomial  is  very  low  this  computation 
will  become  quite  tedious. 

Instead  one  might  start  out  from  another  class  of  estimates 


/*  w = 


IgiN,eM 


v  =  0 


il\  1-2 


=  ls(e")| 


where  the  coefficients  g[N)  of  g  (ea)  depend  on  the  observations.  f%  (A) 
can  be  considered  as  the  spectral  density  of  an  autoregressive  scheme. 
Such  a  process  would  satisfy  the  equation 

p 

2  giN)xt_v  =  7]t, 

i>  =  0 

where  r\t  is  an  orthogonal  process.    The  best  predictor  would  be  given  by 

1      p 
«.*= V  „(#)- 

•M  (N)    Z-i  ifv      -^1-v 

If  /iv(A)  is  to  be  used  mainly  for  prediction,  this  class  of  estimates  would 
seem  to  be  adequate. 

At  present  almost  nothing  is  known  about  the  properties  of  such  estimates. 
One  could  ask  whether  the  estimate  obtained  by  the  application  of  least 
squares 

AT 

2    (xt  -  g,  xt-i gp  xt^pf  =  minimum 

t=p-i 

would  be  a  good  estimate  and  how  p  should  be  chosen  in  a  given  situation. 
At  present  we  cannot  consider  such  questions  and  still  less  the  analogue  of 
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the  problem  discussed  in  the  last  section.  Let  us  note  that  this  indicates, 
as  we  have  stressed  in  previous  chapters,  that  there  is  no  point  in  asking 
what  is  the  best  estimate  unless  we  specify  how  we  are  going  to  use  it.  It 
may  be  for  prediction  as  in  the  last  section,  for  estimating  the  spectral 
mass  in  a  certain  frequency  band,  to  construct  a  filter  or  for  some  other 
purpose.  In  each  case  we  will  in  general  be  led  to  different  estimates.  Still 
more  often  we  will  want  to  use  an  estimate  for  several  purposes  and  then 
we  will  have  to  weigh  the  importance  of  the  different  applications  before 
choosing  one. 

The  following  situation  probably  does  not  occur  often  in  the  sort  of 
applications  we  have  in  mind  but  it  is  of  interest  since  it  has  been  dis- 
cussed in  the  literature.  Suppose  we  want  to  find  whether  the  process 
observed  is  a  low  order  moving  average  or  autoregressive  scheme.  It  would 
in  fact  be  more  in  the  spirit  of  this  monograph  to  ask  whether  the  assump- 
tion is  satisfied  within  the  limits  of  error  one  is  willing  to  allow.  We  might 
then  plot  /J  (X)  and  l/f%(%)  as  functions  of  cos  A.  If  the  assumption  were 
approximately  correct  we  would  expect  one  of  these  curves  to  be  close  to  a 
low  order  parabola.  In  the  case  of  first  order  schemes  we  would  get  a 
straight  line. 

8.6.  The  Zeros  and  Maxima  of  Stationary  Stochastic  Processes 

The  interest  in  statistics  like  the  number  of  zeros  or  the  number  of  maxima 
of  time  series  is  twofold.  There  is  first  the  hope  that  such  "shortcut" 
methods  may  shed  some  light  on  the  structure  of  the  process  generating  the 
time  series.  In  various  special  problems  there  is  also  specific  interest  in 
statistics  like  the  number  of  maxima.  Take,  for  example,  the  case  of  an 
airplane  travelling  through  gusty  winds.  One  wants  to  design  the  airplane 
so  that  it  will  withstand  the  extreme  stresses  imposed  on  it.  The  vertical 
component  of  the  velocity  field  of  the  air  can  be  thought  of  as  a  random 
process  and  one  would  then  be  interested  in  the  extremes  of  this  velocity 
component.  Such  problems  arise  whenever  one  has  a  structure  that  must 
withstand  loads  that  are  random. 

Unfortunately  the  problem  of  obtaining  the  distribution  of  zeros  or 
extreme  values  of  stochastic  processes  seems  to  be  rather  difficult  and  there 
are  relatively  few  results.  We  shall  derive  the  expected  value  of  the  number 
of  zeros  and  the  expected  value  of  the  number  of  maxima  in  a  finite  interval 
a  <  t  <  b  of  a  rather  simple  stationary  continuous  parameter  normal  process 
x(t). 

Let  x{t)  have  a  jump  spectrum  with  only  a  finite  number  of  jumps.  The 
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spectrum  is  symmetric  about  zero  so  that  we  need  only  consider  it  for  X  >  0. 
Let  the  jumps  be 

AF{Xj)=af,     j  =  l,..\,n, 

where  Xj  >  0,  a;  >  0.  The  process  x(t)  can  then  be  written  as  a  finite  trigono- 
metric series 

N 

x  (t)  =  2  aj  (Xj  cos  tAj+  Yj  sin  tAj),  (1) 

where  X1}  ...,  Xn,  Ylf  ...,  Yn  are  independent  normal  variables  with  mean 
zero  and  variance  1.  A  trigonometric  series  of  the  form  (1)  has  a  bounded 
number  of  zeros  and  turning  points  in  the  interval  (a,  b).  The  bound  depends 
only  on  n  and  the  frequencies  Xy 

In  our  derivation  of  the  mean  values  of  the  relevant  statistics  we  shall 
follow  Kac  [2].  In  counting  the  zeros  of  x(t)  we  count  multiple  roots  only 
once.  The  process  x(t)  and  its  derivatives  are  finite  trigonometric  series. 
Their  sample  functions  are  continuous  with  probability  one.  Let 


ipe(x) 


(1   if  |ic|<£ 
[0  otherwise. 


If    neither    a  nor  6  is  a  zero  of  x(t),  the  number  of  roots  Nn  of  x  (t) 
in  (a,  b)  is  given  by 


Nn=\imj-   \xpe(x(t))\x'{t)\dt 


The  probability  of  either  a  or  b  being  a  zero  is  zero. 
The  expected  value 


E(Nn)=  lim  — 


u 

n^"   [E{We{x{t))\x'(t)}dt 
0  *  £  J 


by    Fubini's    theorem.    Now  x  (t)  and  x'  (t)  are  independent  since  they 
are  jointly  normal  and  have  covariance  zero.    Thus 

E  {We  (x  (t))  |  x'  (t)  \}  =  EWe  (x  (t))  E  |  x'  (t) 


e  2°2i  7\x\e  2gl 

V2^  dX  J    ]/2^\  d 


x 


-e 


1   erg 

71  (J1 


e      ldx, 
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where    o\  =  §  dF  (X),  oi  =  $  X2dF  (X)  are  the  variances  of  x  (t)  and  x'  (t) 
respectively.    But  then 

The  maxima  of  x(t)  are  zeros  of  x'  (t)  where  x"  (t)  <  0.  Using  a  completely 
analogous  argument  we  see  that  the  mean  number  of  maxima  Mn  of  x(t) 
in  (a,  b)  is 

E(Mn)-^^\J¥IFw)  '  (3) 

One  can  derive  higher  order  moments  of  these  statistics  by  an  analogous 
procedure.  The  moments,  however,  are  quite  complicated  and  do  not  seem 
to  be  related  simply  to  the  structure  of  the  process.  See  Rice  [1]  for  a  dis- 
cussion of  the  second  moment  of  the  number  of  zeros  of  the  process  x(t). 
One  would  expect  expressions  (2)  and  (3)  still  to  be  valid  when  F(X)  is  of 
a  more  general  form  as  long  as  the  moments  of  F(X)  involved  are  finite. 


8.7.  Prefiltering  of  a  Time  Series 

We  have  discussed  estimation  of  the  spectrum  in  Chapters  4  and  6. 
Many  of  the  results  obtained  in  these  chapters  are  asymptotic.  The  sample 
size  N  at  which  such  asymptotic  results  are  reasonable  approximations 
depends  on  the  variation  of  the  spectrum.  This  is  especially  true  when  esti- 
mating the  spectral  density.  Suppose  that  the  spectral  density  is  estimated 
at  the  point  A.  If  the  spectral  density  changes  rapidly  in  relative  magnitude 
in  the  neighborhood  of  A,  one  must  expect  a  certain  amount  of  contamina- 
tion of  the  estimate  of  /(A)  from  the  spectrum  in  the  neighborhood  of  A. 
This  is  serious  when  /(A)  has  a  relative  minimum  at  A.  Interest  in  estimating 
the  minimum  would  be  great  in  the  following  situation.  Suppose  that  / 
is  the  spectrum  of  vertical  wind  velocity  and  that  some  part  of  a  plane  has 
the  resonant  frequency  A.  It  would  then  be  essential  to  get  a  decent  estimate 
of  /(A)  even  though  /  has  a  minimum  at  A.  We  have  already  made  a  few 
remarks  on  such  contamination  of  estimates  of  the  spectrum  in  section  6.12, 
the  section  on  computations. 

Assume  that  one  does  have  some  a  priori  information  about  /  in  the 
neighborhood  of  A  and  suspects  that  the  situation  described  above  holds. 
Tukey  suggests  that  if  one  wishes  to  improve  one's  estimate  of  /(A),  one 
should  prefilter  the  time  series  so  as  to  smooth  out  the  spectrum  in  the 
neighborhood  of  A  and  estimate  the  spectral  density  of  the  filtered  process. 
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Let  us  see  what  effect  such  prefiltering  has  on  estimation  of  the  spectral 
density  and  how  it  should  most  advantageously  be  set  up.  For  simplicity, 
let  us  say  that  the  weight  function  wN  (x)  is  the  weight  function  proposed  by 
Tukey  (see  section  4.6).  The  bias  of  the  estimats  is  then  asymptotically 
proportional  to  /"  (A)  . 

Let  (p  (A)  be  the  frequency  characteristic  of  the  filter.  The  spectral  density 
of  the  filtered  process  is  then 


g{X)=>.\<p(X)\*f(X). 
Let  the  estimate  of  the  spectral  density  of  the  filtered  process  be  g*  (A).  Then 

is  the  natural  estimate  of  /(A).  The  variance  of  /*(A)  is  asymptotically 
4  n  g2  (A) 


N\<pW 


2 
—n 


wN(x)dx  = — —       wN(x)ax. 


Note  that  the  asymptotic  variance  of  the  estimate  of  /  (A)  is  the  same  whether 
there  has  been  prefiltering  or  not. 

Let  us  now  see  whether  the  bias  of  the  estimate  is  affected  by  prefiltering. 
The  estimate  /*(A)  has  a  bias  asymptotically  proportional  to 

war     '  w+  \fmf  ' w+    \<pw\* 

If  there  is  a  minimum  of  /  at  X 

f"(X)>0,      /'(A)=0 
and  /(A)  is  small.  The  bias  is  then  proportional  to 

\<pW\ 


Clearly  we  ought  to  choose  |9?(A)|2  as  the  reciprocal  of  the  /  conjectured 
on  a  priori  grounds.  Since  we  feel  that  /(A)  has  a  minimum  at  A,  |  cp (A)  |2  will 
have  a  maximum  at  A  and 

(\<P(X)\T<0. 

But  then  (1)  is  decreased  in  absolute  value.  The  prefiltering  does  not  affect 
the  variance  asymptotically  but  it  does  decrease  the  bias.  We  might  say 
that  this  method  amounts  to  pulling  yourself  up  by  the  bootstraps. 
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8.8 
8.8.  Comments  on  Tests  of  Normality 

Many  of  the  statistical  techniques  available  for  the  analysis  of  stationary 
time  series  have  been  designed  for  normal  processes.  Today  one  urgently 
needs  new  methods  of  analysis  valid  under  more  general  conditions.  Al- 
though some  work  in  this  direction  has  already  been  carried  out,  a  deter- 
mined effort  is  necessary  to  obtain  generally  useful  results. 

When  applying  a  method  constructed  under  the  very  common  assumption 
of  normality,  one  has  to  verify  that  this  condition  is  satisfied.  In  certain 
cases  it  is  possible  to  show  that  this  is  true  approximately  by  using  physical 
arguments.  Otherwise  one  has  to  investigate  this  question  empirically  by 
analyzing  samples  drawn  from  the  process  studied.  One  way  of  formulating 
this  problem  is  to  say  that  we  want  to  test  the  hypothesis  that  the  sample 
has  been  obtained  from  a  normal  process.  It  may  be  that  this  is  not  the  best 
way  to  formulate  the  problem,  and  the  following  discussion  should  be  con- 
sidered more  a  preliminary  discussion  rather  than  a  recommendation  of 
specific  methods. 

Let  xt,  0  <  t  <  T,  be  the  observed  sample.  It  is  clear  that  somewhere  in 
the  construction  of  the  statistic  formed  for  the  purpose  of  testing  normality 
a  non-linear  element  has  to  be  inserted.  A.  J.  F.  Siegert  [1]  has  suggested 
the  following  test. 

Suppose  that  the  mean  m  and  covariance  function  B  (t)  have  been  speci- 
fied. Then  we  introduce  the  statistic 

s*(a)  =  [d(a)-<f>(a)Y, 

where  6  {a)  is  the  fraction  of  the  time  T  that  xt  is  greater  than  a  and  </>  (a)  = 
E  0  (a).  To  test  the  normality  assumption  we  can  choose  as  a  critical  region 


00 

Of  ""  [ 

00 


W=  {s2(a)>c} 
W=  {  j  82{a)G{a)da>c\, 

\  —  oo  / 


where  G(a)  is  a  suitably  chosen  weight  function.  Unfortunately  the  signifi- 
cance level  corresponding  to  a  given  value  of  c  is  difficult  to  evaluate.  This 
will  become  still  more  difficult  in  the  important  situation  when  m  and  R  (t) 
are  not  known  a  priori  but  have  to  be  estimated  from  the  sample. 

Another  possible  approach  is  given  by  the  following  test  which  is  described 
in  a  discrete  parameter  context.  If  the  process  is  completely  nondeterminis- 
tic  we  know  (see  section  2.2)  that  it  can  be  represented  as 
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8.8 

%t  ~  z,  av  it-v 


00 


v  =  0 

If  we  demand  that  ^  677tf,  the  stochastic  variables  £„  are  determined  by 
the  xt's  and  there  is  a  linear  inversion  formula 

st  ~£>{xu  xt-i>  ■  •  •) 

using  all  or  some  of  the  past  values  of  the  process.  In  practice  when  we  have 
observed  only  a  finite  sample  xlt  x2,  ...,  xn  the  operator  £  will  have  to  be 
replaced  by  a  linear  form  £*  in  these  variables.  This  introduces  an  end-effect, 
which  one  hopes  will  be  negligible  for  large  values  of  n.  On  estimating  the 
form  of  £  *  from  the  sample  and  applying  the  inversion  formula  we  obtain 
a  sequence  of  stochastic  variables  £?.  If  it  can  be  shown  that  for  a  suitable 
choice  of  £*  the  £?  deviate  only  slightly  from  the  "true"  values  £tf  we  could 
apply  one  of  the  standard  tests  of  normality  of  sequences  of  independent 
stochastic  variables.  One  could,  for  example,  consider  the  stochastic  vari- 
ables 

Sf   "./ 2-,  ^v  %t-v> 


where  c*  = 


v       271 


ew    dX 


—  n 


Yf*  W ' 


and  f*(X)  is  an  estimate  of  the  spectral  density.  This  seems  reasonable  since 

n 

2nJ        ]/f(X)  Hn 


implies  that 


Z  71  Vf  n 


71 

\.    \\lcveivX\2e^-i)Xf(X)dX  I 


2 

-n 


1    ht-*»dx-d§t. 

2tz  j 

—  71 

The  it  are  orthonormal  stochastic  variables,  and  hence  independent  and 
identically  distributed  under  the  normality  assumption. 

276 


■ 


8.8 

This  has  been  a  heuristic  discussion  of  possible  tests  of  normality.  It  would 
be  interesting  to  investigate  the  conjectures  made  above  in  a  rigorous  manner. 
It  would  also  be  desirable  to  discuss  in  some  detail  the  alternative  hypotheses 
one  has  in  mind  when  constructing  these  tests.  A  reasonable  alternative  hy- 
pothesis might  be  e  =  i££f  —  3=1=  0. 
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PROBLEMS 


Chapter  1 

1.  Let  xt,  Ext  =  0,t  =  '",  —  1,0,1,...,  be  a  weakly  stationary  process 
with  spectral  distribution  function  F{X).  One  often  finds  a  statement  saying 
that  xt  has  the  representation 

n 

xt=   jemdZ(X), 


where  EdZ{K)dZ{{ji)  =  dxfJ.dF(X).  Here  dxp  is  the  Kronecker  6.  How  should 
one  interpret  the  differential  notation  so  as  to  make  this  a  meaningful 
statement? 

2.  A  real- valued  weakly  stationary  process xt,  t  =  •••,  —  1,0, 1, . . .,  Ext  =  0, 
has  the  complex  representation  given  in  the  preceding  example  and  the  real 
representation 

n  n 

xt=    J  cos  tXdZ1(}.)  +   J  sin  t?.dZ2(?.), 
o  o 

where  EdZi(X)dZj{fi)=dijdX[X2dF{X),     i,j  =  1,2. 

Show  that  dZ1(h)=2RedZ(?i),  dZ2(X)  =  -2ImdZ(X). 

3.  Let  xt,  Ext  =  0,  t  G[0,  2jc]  be  weakly  stationary  on  the  unit  circle.  We 
understand  by  this  that  £  =  0  is  identified  with  t  =  2tz  and  that 

Ext  xs  =  ru       u  =  t  —  s  mod (2n). 

00 

Show  that  rt  =     2    an  eltn, 

n  =  -oo 

where  the  coefficients  an  are  the  Fourier  coefficients  of  rt  and  that 

00 


*nCH 
n-  =  oo 


where  EZnZm=Snman. 


4.  Let  xt,  Ext  =  0,t  =  •  •  *,  —  1,0, 1, . . .  be  a  strictly  stationary  normal  proc- 
ess. Show  that  the  process  is  ergodic  if  and  only  if  the  spectral  distribution 
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function  F(X)  of  xt  is  continuous,  that  is,  if  and  only  if  F(X)  has  no  jumps 
(see  Grenander  [1]  and  Maruyama  [1]). 

5.  Let  xt  be  a  strictly  stationary  normal  process  just  as  in  the  previous 
example.  A  strictly  stationary  process  is  said  to  be  mixing  if  for  every  two 
measurable  sets  A,B  P(Af]  Tn B)-*P(A)P(B)  as  n-*oo.  Here  T  is  the 
one-step  translation  operator.  Show  that  the  normal  process  xt  is  mixing  if 
and  only  if  rt->0  as  |<|->oo  (see  Maruyama  [1]). 


6.  Give  an  example  of  a  normal  strictly  stationary  process  with  a  singular 
distribution  function  that  is  mixing. 

7.  Show  that  a  strictly  stationary  process  that  is  mixing  is  ergodic  (see 
Hopf  [1]). 

8.  Let  xt  =  (xpjX^),  Ext  =  0,t  =  •  •  •,  —  1,0, 1, . . .  be  a  weakly  stationary 
process  with  real-valued  components.  Since  xt  has  real- valued  components, 
it  follows  that 


dFij(X)=dFij(-X)',    i,j  =  1,2. 
Show  that  the  process  xt  has  the  real  representation 


n 


x^=  j  costAdZ?)(X)+  ffuntXdZjPilL) 

0  0 

n  n 

42)=  j  costAdZ?}(A)+  j  sin  tidzfc>(A), 


where 


EdZp(k)dZp(p)  =  EdZt\l)dZf\ii)=UifiXtxdFij{X) 
EdZl1)(X)dZl2)(ju)=2d„nedF1M,     i,/  =  l,2, 
EdZF{X)dZF(p)  =  -  EdZ¥{l)dZf{ii)  =  2dX/l  ImdF12(X). 

If  the  process  has  an  absolutely  continuous  spectral  distribution  function, 
the  real  and  imaginary  parts  of  the  cross-spectral  density  Re/12(A),  Im/12(A) 
are  sometimes  referred  to  as  the  cospectrum  and  quadrature  spectrum  of 
xf  and  xf\ 

9.  Show  that  a  harmonizable  process  xt  with 

F(X,p)=  j    j  f(u,v)dudv 

—  n  —n 
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absolutely  continuous  has  covariances 

as  |  s  |  ->oo  or  1 1  |  ->oo.  What  kind  of  a  harmonizable  process  has  f{u,v)  =  /? 

10.  Let  xn,  n  =  •••,  —  1,0, 1, ...,  be  a  family  of  independent  and  identically 
distributed  stochastic  variables  with  probability  distribution  P  {xn  =  +  1}  = 
J.  Let  yn=f(xn,xn+1),  where  a=/(-l,-l),  6  =/(-!, 1),  c  =  /(l,-l), 
cZ  =  /(l,l).  Thus  yn=  a,b,c,d  each  with  probability  1/4.  Find  cov («/,-,  2/fc). 
For  what  values  of  a,b,c,d  is  the  process  yn  an  example  of  "white  noise"  but 
not  an  example  of  "pure  white  noise"? 

11.  Let  xt,  E  xt  =  0,  —  oo  <  t  <  oo,  be  a  weakly  stationary  process.  Let  the 
spectrum  of  the  process  be  band-limited  to  the  band  from  —  nw  to  nw  so 
that  xt  has  the  representation 

nw 

xt=    j  emdZ(X). 


nw 


Show  that 


^L  sin  7i  (wt  —  n) 

Xt  —-      2-i     %n/i 


n  =  -oo 


w  i      *  \ 

n  (wt  —  n) 


Chapter  2 

1.  Let  xt,  Ext  =  0,  t—  ...  ,  —  1,  0,  1,  •••  ,  be  an  autoregressive  process 
with  spectral  density 

/W)  =  ^|a(e-a)|-« 

V 

where  a  (z)  =  2  ak  zk  h^s  all  its  zeros  zx,  ...,  zp  outside  the  unit  circle 

fc  =  0 

|z|<l.  Assume  that  the  roots  zt  are  simple.  Find  an  explicit  expres- 
sion for  the  best  linear  predictor  v  steps  ahead  in  the  sense  of  least 
squares  in  terms  of  the  random  variables  xt. 

2.  Assume  that 

xt=  2/e^, 

k  =  l 

£  =  ••-, -1,0,1,...,  where  Ezt  =  0,  E zf  <  oo  and  0<Aj<tt.  The  lx  are 
assumed  to  be  known.  Such  a  process  is  completely  deterministic.  lixlt . . ., 
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xp  have  been  observed,  show  that  xp+1  is  determined.  Give  an  explicit  expres- 
sion for  xp+1  in  terms  of  xx,  ...,  xp. 

3.  Let  xt,  Ext  =  0,  —  oo  <  t  <oo,  be  a  weakly  stationary  process.  Show 
that  xt  is  infinitely  differentiate  if 

oo 

/  du*  dF(X)<  oo 

—  OO 

for  some  £4=  0,  where  F(X)  is  the  spectral  distribution  function  of  xt. 

4  {Continuation).  Show  that  the  whole  history  of  xt  can  be  predicted 
without  any  error  if 

oo 

j  e^aidF{?.)<oo 

—  oo 

for  some  1 =4=  0. 

5  (Continuation).  Give  an  explicit  representation  of  the  predictor  that 
would  verify  the  preceding  statement. 

6.  In  both  the  continuous  and  discrete  parameter  case,  a  weakly  station- 
ary process  is  completely  deterministic  if  its  spectrum  is  band  limited.  With 
a  finite  part  of  the  past  one  can  predict  perfectly  in  the  continuous  para- 
meter case.  Verify  that  this  is  not  generally  true  in  the  case  of  a  discrete 
parameter  process. 

7.  Let  yt=  xt+m,  t  =  •••,—  1,0, 1,...,  Eyt  =  m,  where  xt  is  a  weakly 
stationary  process.  The  mean  value  m  is  assumed  to  be  unknown.  Find  the 
one-step  prediction  error  for  the  best  linear  unbiased  (the  mean  value  of  the 
predictor  is  m)  predictor. 

8.  Let  xt,  Fxt  =  0,  t  =  •••,  —  1,0, 1,...,  be  a  strictly  stationary  Markov 
process  with  a  finite  number  of  states  and  transition  probability  matrix 
M.  Assume  that  M  has  only  simple  eigenvalues.  Find  the  spectrum  of 
xt  in  terms  of  the  eigenvalues  of  M . 


Chapter  3 

1.  Select  a  sequence  of  random  numbers  from  a  table  of  random  numbers. 
Illustrate  Slutzky's  theorem  by  repeatedly  applying  a  fixed  smoothing 
operation  to  the  sequence. 
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2.  Consider  the  ratio  x'Ax/x'x  where  x  =  (x11x2,  ...,  xn)  and  the  x/s 
are  normal,  independent  stochastic  variables  with  mean  zero  and  variance 
a2.  Let 

(B      0      0 
A=\o     XI     0 
,0       0      B) 


where  B  is  a  real  symmetric  kx  k  matrix  with  distinct  eigenvalues  X±  >  X2  > 
-"  >  Xk  and  X <  Xk  or  X >  Xv  Here  /  is  an  identity  matrix  of  order  p.  Find 
the  probability  distribution  of  the  ratio  x'A  x/x'  x  (see  Durbin  and  Watson 

3  (Continuation).  What  is  the  distribution  of  the  ratio  x' Axjx'x if  p  =  0? 

4  (Continuation).  Find  the  exact  probability  distribution  of 

n-l  in 

2  Xj  xj+1  /  2  x%    n  =  %  m> 

I      7  =  1 


7=1 


n-l 


and 


2  a?/ #y+i  /  2  ^2>     n  =  2m+l, 


7=1 


y=i 


5.    (Continuation).    Find  the  exact  distribution  of 

n-l  /    n 

2  (iy-a?y+i)a  /  2  (^-^)2>     »  =  2m, 

/    ;=i 


and 


where 


7=1 

y+m 


n-l 


2  (Xj  —  Xj+i)2  /  2  (xi ~ ^)2>     n  =  2m  +  l. 


7=1  /      7=1 

y=j=m,  m  +  l 


1     n 

nyfl    ; 


Chapter  4 

1.  Let  (xt,yt),  E  xt  =  Eyt  =  0,  t  =  •••,  —  1,0, 1,...,  be  a  normally  dis- 
tributed stationary  process  with  an  absolutely  continuous  spectral  distribution 
function.  The  spectral  density,  which  is  matrix-valued  (2  x  2),  is  assumed 
to  be  nonsingular  for  all  X.  The  observations  (x^y^, . . .,  (xN,yN)  are  made.  Let 
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N-v 


* 


1 

«  2  ^2/t+v       if  r>0 

^v  t=i 


Show  that 


2    XtVt+v     if  ^<0. 


1  1      N 

Re  12/ ^  U)  =  ^— 12^0  ^0  }  +  -  2  nrt  wi  )  cos  v  X 

£71  71  v  =  \ 


1  N 

and  Im  12f%  (X)  =  -  2  i2r?  ^^}  sin  r  A 

7T  „=i 

are  consistent  estimates  of  the  cospectrum  Re/12(^.)  and  the  quadrature 
spectrum  Im/12(A)  respectively  as  N-+00  if 

as  iV^oo  for  every  fixed  v  and 

*    v  I  N 

v  =  Q   -M  I     v  =  0 

as  N-+00  (see  Grenander  and  Rosenblatt  [6]). 

2  (Continuation).  Obtain  an  asymptotic  expression  for  the  variance  of 
the  estimate  Re12/*Ar(A)  of  the  cospectrum  as  N->oo. 

3  (Continuation).  Obtain  an  asymptotic  expression  for  the  variance  of 
the  estimate  Im12f*N(X)  of  the  quadrature  spectrum  as  N^oo. 

4  (Continuation).  Obtain  asymptotic  expressions  for  the  bias  of  the  esti- 
mates of  the  cospectrum  and  quadrature  spectrum  as  N-^oo. 

5*  Let  xtTy  ExtfT  =  0,  t,r  =  •••,  —  1,0, 1,...,  be  a  normally  distributed 
stationary  process.  The  process  xtr  is  assumed  to  have  an  absolutely  con- 
tinuous spectral  distribution  function  and  a  continuous  spectral  density. 
The  process  is  observed  for  t,r  =  1,  ...,  N.  The  function 


In  (A,  ja) 


1 


4:71*  N 


2    A72 


2   «*,,e,tt+,w 


is  a  two-dimensional  analogue  of  the  periodogram.  Obtain  an  asymptotic 
expression  for  the  variance  of  In(X,ju)  as  iV->oo.  Is  IN(X,/u)  asymptotically 
unbiased?  See  Grenander  and  Rosenblatt  [6]. 
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6  (Continuation)   Let 


*   v 

rt.  r~  ^2    Z,  xct,pxa.+t,p+r 
iV      ol.B 


N 

*     ««(n')  s,-itA.-iTft 


and  f%(lf*)=     2     >**r*e 

Show    that    /*  (A,  ju)    is    a    consistent    estimate    of   the  spectral  density 
/(A,  ju)  at  N-+oo  if 

as  iV->oo  for  every  fixed  t,  r  and 

t,r=-N   ^*  /     t,r  =  -N 

as  2V-^oo. 

7  (Continuation).  Obtain  an  asymptotic  expression  for  the  variance  of 
ft(X,fji)  as  N-^oo. 

8.  Let  jct,  Ext  =  0,  t  =  •••,  —  1,0, 1,...  be  a  normal  stationary  process. 
Let  /^ (A)  be  a  spectrograph  estimate  of  the  spectral  density  /(A).  Under  the 
assumptions  made  in  Chapter  4  on  the  spectrum  and  the  weights  w(vN\ 
obtain  an  asymptotic  expression  for 

n 

\E\fN(X)-f(X)\2dX 

—n 

9.  Let  xt,  Ext  =  0,  —  oo  <  t  <  oo,  be  a  weakly  stationary  process  with  an 
absolutely  continuous  spectral  distribution  function  and  a  continuous  spec- 
tral density.  Assume  that  one  wishes  to  estimate  the  spectral  density  func- 
tion and  that  one  has  to  discretize  the  data  in  order  to  carry  out  the  compu- 
tations. One  can  discretize  the  data  by  considering  ytc~xkn>^  =  '"y  ~  1» 0, 
1,...,  or  else  by  looking  at 

(fc  +  £)ft 

Vk  =  7  %tdt,     k=>~  ,  -1,  0,  1,  ...  . 

Discuss  the  advantages  and  disadvantages  of  these  two  ways  of  discretizing 
the  data. 
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10.  Explain  the  discontinuity  in  the  asymptotic  expressions  for  the 
variance  of  spectrograph  estimates  of  the  spectral  density  at  X  =  0. 

11.  What  is  the  effect  of  observing  a  finite  sample  xlf  ...,  xN  on  the  resolv- 
ing power  of  a  spectrograph  estimate  of  the  spectral  density? 

Chapter  6 

00 

1.    Let  xt  =     2    at-k£k  where  the  gk  are  independent  and  identically 

k  =  -oo 

distributed  and 

Eijk  =  0,  E£t  <  oo,    Eaf  <  oo. 

The  values  xlt  x2,  . . .,  xN  are  observed.,  Consider  the  correlation  coefficients 
r?/ro,  v+0.  Obtain  asymptotic  expressions  for  the  co variances  of  the 
correlation  coefficients  (see  Bartlett  [3]). 

2  (Continuation).  Bartlett  has  suggested  using  G*  (X)  =  F % (X)/r J  as 
a  statistic  in  estimating  G  (X)  =  F  (A)/r0,  the  normalized  spectral  distri- 
bution function.  Find  the  asymptotic  distribution  of 

max  |  G%  (X)  -G  (X)  |, 

0<A<7i 

when  xt  is  pure  white  noise,  as  iV^oo  (see  Bartlett  [3]). 

3  (Continuation).  Find  an  asymptotic  expression  for  the  co  variance 
function  of  the  process  GN(X),  when  xt  is  a  general  linear  scheme,  as  N->oo. 

4.  Let  xt,  E  xt  =  0,  —  oo  <  t  <  oo,  be  an  ergodic  strictly  stationary  process. 
Assume  that  xt  has  an  absolutely  continuous  spectral  distribution  function 
with  spectral  density  f(X),  and  that 


oo 


J*  emf(X)dk<oo 


—  00 


for  some  2=h  0.  Show  that  knowledge  of  any  finite  part  of  the  record  of  the 
process  xt,  a  <  t  <b,  is  enough  to  determine  the  spectrum  exactly. 

Chapter  7 

1.  Let  yt  =  xt  +  m,  Ext  =  0,  t  =  •••,  —  1,0, 1,...,  where  xt  is  a  first  order 
stationary  autoregressive  scheme.  Assume  that  ylt  ...,  yn  are  observed. 
Get  exact  expressions  for  the  least  squares  and  Markov  estimates  of  m  for 
finite  sample  size  n. 
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2  (Continuation).  Get  exact  expressions  for  the  variances  of  the  least 
squares  and  Markov  estimates  of  m  for  finite  sample  size  n.  What  is  the 
magnitude  of  these  variances  for  n  =  10,  15,  25? 

3.  Let 

\&tl  \2Xt)         \2mt/ 

Ext  =  0,  t  =  ••-,  —  1,0, 1, . . .,  where  xt  is  a  weakly  stationary  process  with 
an  absolutely  continuous  spectral  distribution  function  and  a  continuous 
and  nonsingular  spectral  density  function.  The  components  of  xt  are  real- 
valued.  Assume  that  yl9  ...,  yn  are  observed.  Let  jra^m,  2mt=m.  Get 
asymptotic  expressions  for  the  variances  of  the  least  squares  and  Markov 
estimates  of  m  as  n->oo  and  compare  them  (see  Rosenblatt  [1],  [2]). 

4  (Continuation).  Let  1mt  =  ra1  cos  tX,  2mt  =  m2  cos  t X.  Get  asymptotic  ex- 
pressions for  the  covariance  matrices  of  the  least  squares  and  Markov  exti- 
mates  of  (ml9  ra2)  respectively  as  w^oo  and  compare  them. 

5  (Continuation).  Let  jmt  =  ax  cos  t X  +  px  sin  tX,  2me  =  a2  cos  t X  +  /?2  sin  tX. 
Get  asymptotic  expressions  for  the  covariance  matrices  of  the  least  squares 
and  Markov  estimates  of  (<xl5  @lt  a2,  /?2)  respectively  as  n->oo  and  compare 
them. 

6.  Let  ytr  =  xtr+m,  EyttT  =  m,  t,r  =  •••,  —  1,0, 1,. ..,  where  xt>T  is  a 
weakly  stationary  process  with  an  absolutely  continuous  spectral  distri- 
bution function  and  a  continuous  spectral  density.  Assume  that  ytr,  t,r  =  1, 
. . .,  n,  is  observed.  Get  asymptotic  expressions  for  the  least  squares  and 
Markov  estimates  of  m  as  n->oo  and  compare  them. 

7.  Let  (xt,yt),  E xt  =  E yt  =  0,  t  =  •••,  —  1,0, 1,...,  be  a  normal  process 
with  an  absolutely  continuous  spectral  distribution  function  and  a  con- 
tinuous nonsingular  spectral  density.  Find  the  conditional  spectral  density 
of  the  process  xti  t  =  •■«,  —  1,0,1, ...,  under  the  condition  that  yt  =  y. 


Chapter  8 

1.  Consider  the  asymptotic  expression  derived  in  Chapter  8  for  the  one- 
step  prediction  error  when  the  spectral  density  has  been  estimated  by  a 
spectrograph  estimate.  Use  this  expression  to  evaluate  the  prediction  error 
when  the  weight  function  of  the  spectrograph  estimate  is 
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^  n/2hN  \!\<hN 

{  0  otherwise. 

How  should  one  let  hN  behave  as  a  function  of  N  so  as  to  have  the  prediction 
error  approach  the  limiting  prediction  error  as  N->oo  as  fast  as  possible? 

2.  Let  xt,  Ext  =  0,  —  oo  <  t  <  oo,  be  a  normal  stationary  process  with 
continuously  differentiable  sample  functions.  Evaluate  the  probability  of 
xt  passing  through  zero  in  the  interval  (t,t  +  dt)  with  negative  slope  on  condi- 
tion that  xt  passes  through  zero  at  t  =  0  with  positive  slope.  This  probability 
can  be  used  to  get  an  approximation  to  the  probability  distribution  of  the 
distance  t  between  zeros  when  t  is  small  (see  Rice  [1]). 

3.  Let  xt,  E xt  =  0,  t  =  --,  —  1,0, 1, . . .  be  a  real- valued  strictly  stationary 
process.  Consider  the  third  order  moments  (which  are  assumed  to  exist) 

ru-u->  u-u  =  rtut2,t3  =  &  xtiXUxW 

Define  a  third  order  spectral  distribution  function  in  terms  of  these  moments. 
What  symmetry  conditions  on  the  moments  and  the  third  order  spectrum 
are  implied  by  the  fact  that  xt  is  real- valued?  What  happens  to  the  third 
order  spectrum  when  xt  is  passed  through  a  linear  filter? 

4  {Continuation).  Consider  the  corresponding  questions  for  the  fourth 
order  moments 

rti-t2,  u-u,  u-u  =  ru.u,u,u  =  E xuxuxuxu- 

5  (Continuation).  What  is  the  character  of  the  third  and  fourth  order 
spectra  if  xt  is  a  normal  process? 
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APPENDIX  ON  COMPLEX  VARIABLE  THEORY 


Let  / (z)  be  a  function  analytic  in  the  unit  circle  \z\  <  1 .  The  mean  value 


2n 


Va(f;r)  =  ^j  \f(reie)\«dd,     a>0,     (r<l) 


according  to  a  theorem  of  Hardy  (see  Hardy  [1])  is  a  nondecreasing  function 
of  r.  The  function  f(z)  is  said  to  belong  to  the  class  Ha  if  /bia(f;r)  is  bounded  as 
r->l  —  and  tends  to  a  limit  //«(/)•  The  classes  H2  and  especially  Hx  will  be 
of  particular  interest  to  us. 

If  f(z)  £H2  then  as  r->l  — ,  f(rexd)  converges  in  the  mean  square  to  a  limit 
function  f(el6)  of  the  class  L2.  Since  f(z)  &H2 

00  00 

lim  ju2{f;r)  =  lim     2   \an\2r2n=  2  |«n|2<oo, 

r-*l—  r->l  —    n  =  0  n  =  0 

00 

where  f(z)=   2  an^n .    Rut  then 

n=0 


2n 


lim     /  \f(rew)-f(r'ei6)\2dd=    lim      2  K  |2  \rn  -  rn  |2^0 

r',r->l—  o  r',  r-»-l—  n=0 

so  that  there  is  a  limit  in  the  mean  square  and  it  is  easily  seen  that  this 
limit  is 

/  (ew)  =2an  eine  e  L2. 

71=0 

From  the  proof  it  can  be  seen  that  ju2(f;r)  is  a  nondecreasing  function  of  r. 
Write  f(z)  =u(z)  4  iv(z),  f{el°)  =  u(exe)  4  iv(eld),  where  u  and  v  are  real. 
Let  an  =  an  4  i{$n,  where  an,  f$n  are  real.  Then 

oo 

u  (reld)  =  2  rn  (an  cos  nd  —  pn  sin  a  0) 

71=0 

so  that 

In  In 

<xn  =  -      u  (eie)  cos  n  d  d  d,  Bn  =  -  -      w  (e'e)  sin  ?i  0  d  0. 

0  0 
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Thus 


J0\ 


2n  2n 

i   ---jf-  \  u(ei(f>)dd>  +    2rnl    I  u(e§*)  cos  n  (6  -  $)  d  <f> 

2jt  J  n  =  l       71  J 

0  0 

2n 


= —       zl : 9u(el(t,)d<h 

2n)    l-2r  cos  (6>  — 0)  +  r2    v     ;    v 


holds  for  r<l.    Similarly 


271 


1 


v{rete)=  2>n-       u(ei4>)sinn(d-cl>)dcf> 

n=0        7T  J 
0 


2^ 


5^  J    1 


r  sin  (6  —  (/)) 


2 7i  J    1-2 r  cos  {d-(f>)  +  r2 
o 


u{el*)d<j> 


holds  for  r  <  1 .    But  then 


J0^ 


J0\ 


,*0\ 


2n 


1         el9  +  re 


i$    \VJQ 


2tz  J    e^-re1 
o 


reu(ei,f,)d<l>,     r<\ 


Let  u  (e**)  €  £•    i"/ 


2rc 


*0\ 


w(ret0) 


1-r2 


2rc  J    l-2r  cos  ((9-<£)  +  r2 


u{el<t>)d(f> 


then  u(rel6)-^u(eld)  as  r->l—    almost  everywhere. 
Now 


Clearly 

as  r->l  —  .    But 


'In 

5  n  J    1  —  2  r  co: 


cos  (0-</>)  +  r2 

|<£-0|>£ 


^(e^)-tt(ei0)|d<£. 


0(«)  = 


tt(e^)-w(ei0)|d<£  =  o(*) 


l0-ei<f 
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as  £->0  for  almost  all   6.    Thus 


If  l-r2 

7~        I        , o 7Z m o\u(ei<f,)-u{ew)\d<f> 

\n       J       l-2r  cos    0-<£)  +  r2  r    v     '        v     "    Y 


2n  l-r 
for  almost  all  6.    Moreover 

1  f  l-r2 

2 n        J         l-2r  cos  {0-6)  +  r2 '  v       '     r 

l-r<|0-6|<e 


1    1  +  r 

-O  (l-r)  =  o(l) 


1  l-r2  1  l-r 


2 


<&(e)-;r-;— s 7^ mi*  (!-')  + 


2  7T  1  —  2r  cos  s  +  r  2n  1  —  2r  cos  (1  —  r)  +  r 

-u   X  I*  (l-r8)2rsin(9-# 


(l-r)<|  </>-0|<e 


^r^-  i — o 2  OOO  +  fi-O-o1 


2tt  l-2r  cos  £  +  r2      w     v         '      \         J  (0-<h)2/ 

l-r<|0-0|<e 
=  0(1) 

for  almost  all  0. 

If  f(z)  £H2  then  —  if  (re16)  is  analytic  in  |z|  <  1  and  belongs  to  H2.  But 
v(reld)  is  the  real  part  of  —  if(reld)  and  thus  converges  to  v(eld)  in  the  mean 
square  as  r->l  —  .  The  Poisson  formula  then  holds  for  v(reld)  in  terms  of 
v(el6)  and  so  v(rel6)^v(eld)  as  r-^1  —  almost  everywhere.  Thus,  if  f(z)  £H2 
then  f(rel6)->f(eld)  as  r->l  —  for  almost  all  6. 

We  now  show  that  //x(/,r)  is  a  nondecreasing  function  of  r.  Let  0  <  r1  < 
r2  <  1  and  let  k(0)  and  F(z)  be  defined  by 

k(d)f(r1eid)  =  \f(r1ew)\,     0<d<2jr, 

In 
0 

F(z)  is  regular  for  \z\<  r2  and  hence  |i^(z)|  attains  its  maximum  in  this 
circle  on  the  boundary,  say  at  r2ea.  Then 

Mf',r1)  =  F(r1)<\F(r2en)\<^1(f;r2). 

We  show  that  every  function  f(z)  £H1  has  a  decomposition  into  two  factors 
f(z)=g(z)h(z),    where   h(z)   is  regular   and  bounded  in  \z\<l,  |&(eJ0)|  =  l, 
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while  g(z)  £H1  and  never  vanishes  in  \z\  <  1.  This  theorem  holds  for  Ha  in 
general  and  is  due  to  F.  Riesz,  "Uber  die  Randwerte  einer  analytischen 
Funktion",  Math.  Zeit.,  vol.  18,  pp.  87-95.  If  f(z)  does  not  vanish  for  \z\<  1, 
the  representation  is  given  by  g(z)  =  f(z),  h(z)  =  1.  Let  al9a2,  ...  be  the  zeros 
of  f(z)  in  |  z  |  <  1  arranged  in  order  of  increasing  absolute  value.  First  assume 
/  (0)4=0.  Set 

hn{z)  =  kM^z>    gn(Z)  =  hB)' 

hn(z)  and  gn{z)  are  regular  in  \z\  <  1.  On  the  circle  z  =  eld,  \h(z)\=l  and 
for  every  n  and  for  every  positive  s 

\hn(re,e)\>l-s 

as  soon  as  r  lies  close  enough  to  1.  Then 

Vi  {g'n ;  r)  <  -—  fix  (/;  r)  <  ^— -  //?  (/) 

if  r  lies  sufficiently  close  to  1.  But  since  /x^(gn\T)  is  a  nondecreasing  function 
of  r 

Pi(9n;  r)<—-—fj,t(f) 

and  on  letting  £^0,  jUj^ig^r)  <  fx\  (/).  If  there  are  only  a  finite  number  of 
zeros  a1}a2,  . ..,  an  we  already  have  the  desired  decomposition 

F{z)=gn(z)hn(z). 
If  there  are  infinitely  many  zeros,  then 

\gn(0)\=fi1(9n;0)<fii(f) 

^n  (0)     |  ax  •  •  •  ax  I 
so  that  I  «i- «*  I  ^1/(0)  |  faff/)}-1 


for  every  w  and  hence  the  infinite  product  IT  |  oc/c  |  converges.    But  then 
the  infinite  product 

oo 

A  (z)  =  lim  hn  (z)  =  ]!  I  afc  | 1-^- 
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converges  uniformly  in  every  circle  |  z  |  <  r  <  1 .  The  function  h  (z)  is  regular 
in  \z\  <1  and  there  |^(z)|<  1.  Note  that  |A,(ez0)|  =  1.  h(z)  has  the  same 
zeros  as  /  (z)  so  that  for  |  z  \  <  1  the  function 

h{z) 

is  regular  and  not  zero.  Further  gn{z)  converges  to  g(z)  uniformly  on  every 
circle  \z\<  r<l  so  that  ^{g\r)  <  fi\{f).  Now  if  /(0)  =0,  set  f(z)  =zmf1{z), 
where  f1  (z)  is  regular  at  z  =  0  and  f1  (0)  4=  0.  Then 

fix  (/x;  r)  =  r ~  m/a1  (/;  r)  ->^  (/) 

so  that  jut  (fi)  =  ja*  (/)  and  in  particular  f±  (z)  EHV  The  decomposition  fx  (z)  = 
g(z)h(z)  yields  the  decomposition  of  f(z)  into  g(z)  and  zmh(z). 

Given  any  function  /G#i,  consider  its  decomposition  into  g(z)  and  h(z). 
Since  g  (z)  vanishes  nowhere  in  the  unit  circle  there  is  a  unique  determination 
y(z)  of  {g(z)}^.  Now  y(z)  £H2.  The  function  h(z)  is  bounded  and  thus  also 
belongs  to  H2.  It  then  follows  from  what  has  been  proved  earlier  that 
y{reld),  h(re6)  approach  y{eld),  h(eld)  respectively  almost  everywhere  as 
r->l  — .  It  is  clear  that  f{reld)  approaches  f(eld)  =  {y(eld)}2  h(eld)  almost 
everywhere  asr^l  — . 

We    shall   now   show   that    if  /(z)G#i,  then  j\f{eid)-f(reid)\dd^0ns, 

o 

r->l  — .  First  we  show  that  for  any  measurable  set  M 

j\f(rneid)\d0^j\f(ew)\dd 

M  M 

where  rn->\  —  .    Now 

D=  f{\f(e">)\-\f(rne,f,)\}d8 

M 

=  j{\h\\y\2-\hn\\yn\2}dd 

M 

=  S{\h\-\hn\}\y\2d6-  S\h\{\yf-\r^}dd. 


M  M 


Since  |  h(reld)  \  <  1  it  follows  that  the  integrand  of  the  first  integral  is  less 
than  \y\2  in  absolute  value  and  since  hn^h  almost  everywhere  the  first 
expression  tends  to  zero. 

The  integrand  of  the  second  integral  is  less  than  ||y|2  —  |yn|2|  ^  \y2  ~ 
7n\  =  \y  ~  yn  ||  7  +  7n  I  and  so  the  second  integral  is  less  than  the  square  root 
of 

292 


l\y-yn\2ddj\y  +  yn\2d6<4tf(f)l\y-yn\2dd-*o 


MM  M 


by  the  Schwarz  inequality.  Since  f(rei6)—>-f(eld)  almost  everywhere,  by 
Egoroff's  theorem  there  are  sets  of  arbitrarily  small  measure  M  such  that 
f(rel6)-^f(eld)  uniformly  in  the  complementary  set.  Let  M  be  of  sufficiently 
small  measure  so  that 


j\t\dd 

M 


<e. 


Then  for  n  large  enough 

S\fn\d6<s 


M 


so  that   J  |  /  —  fn\dd<2  e.    However 

M 

j\f-fn\d6->0 


CM 


because  of  uniform  convergence.  Here  CM  is  the  complement  of  M .  Since 
e  is  arbitrarily  small  the  theorem  is  proven, 
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Noise,  coherent,  46. 
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shot,  41. 
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nonlinear,  51. 
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translation,  44. 
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Poisson  process,  23. 
Potential  energy,  176. 
Prediction,  error  of,  74. 

linear,  65. 
Prefiltering,  273. 

Quadratic  forms,  distribution  of,  101. 
ratios  of,  105. 

Rectifier,  linear,  53. 

square  law,  54. 
Regression  analysis,  226. 
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polynomial  regression,  245. 

pulse  regression,  254. 

regression  spectrum,  240. 

—  elements  of,  241. 

trigonometric  regression,  245. 
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transient,  49. 
Reynold's  number,  164. 
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Sampling,  periodic,  57. 
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Signals,  detection  of,  255. 
Spectral  density  function,  35. 
Spectral  distribution  function,  35. 
Spectrograph    estimate,    variance    of, 
131. 

bias  of,  129. 


Spectroscope,  grating,  156. 
Spectrum,  cross  — ,  46. 

estimation  of ,  117. 

singular  part  of,  35. 
Stochastic  process,  differential,  26. 

harmonizable,  59. 

maxima  of  a,  273. 

purely  deterministic,  75. 

purely  nondeterministic,  75. 

strictly  stationary,  29. 

weakly  stationary,  33. 

normal,  23. 

vector,  44. 

with  orthogonal  increments,  26. 
Stochastic  set  function,  25. 

Taylor's  hypothesis,  171. 
Tests,  of  normality,  275. 
Time  series,  smoothing  of,  95. 

artificially  generated,  214. 
Turbulence,  isotropic,  165. 

homogeneous,  165. 

velocity  field  of,  164. 

Variate  difference  method,  94. 
Viscosity,  164. 

Weight  function,  Bartlett's,  146. 
rectangular,  147. 
Tukey's,  149. 

Zeros  of  time  series,  271. 
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